# Hook Chains (Self-Healing Agent Mesh MVP) Hook Chains provide an event-driven recovery layer for important workflow failures. When a matching hook event occurs, OpenClaude evaluates declarative rules and can dispatch remediation actions such as: - `spawn_fallback_agent` - `notify_team` - `warm_remote_capacity` ## Disabled-By-Default Rollout > **Rollout recommendation:** keep Hook Chains disabled until you validate rules in your environment. > > - Set top-level config to `"enabled": false` initially. > - Enable per environment when ready. > - Dispatch is gated by `feature('HOOK_CHAINS')`. > - Env gate defaults to off unless `CLAUDE_CODE_ENABLE_HOOK_CHAINS=1` is set. This keeps existing workflows unchanged while you tune guard windows and action behavior. ## Feature Overview Hook Chains are loaded from a deterministic config file and evaluated on dispatched hook events. MVP runtime trigger wiring: - `PostToolUseFailure` hooks dispatch Hook Chains with outcome `failed`. - `TaskCompleted` hooks dispatch Hook Chains with outcome: - `success` when completion hooks did not block. - `failed` when completion hooks returned blocking errors or prevented continuation. Default config path: - `.openclaude/hook-chains.json` Override path: - `CLAUDE_CODE_HOOK_CHAINS_CONFIG_PATH=/abs/or/relative/path/to/hook-chains.json` Global gate: - `feature('HOOK_CHAINS')` must be enabled in the build - `CLAUDE_CODE_ENABLE_HOOK_CHAINS=0|1` (defaults to disabled when unset) ## Safety Guarantees The runtime is intentionally conservative: - **Depth guard:** chain dispatch is blocked when `chainDepth >= maxChainDepth`. - **Rule cooldown:** each rule can only re-fire after cooldown expires. - **Dedup window:** identical event/action combinations are suppressed for a window. - **Abort-safe behavior:** if the current signal is aborted, actions skip safely. - **Policy-aware remote warm:** `warm_remote_capacity` skips when remote sessions are policy denied. - **Bridge inactive no-op:** `warm_remote_capacity` safely skips when no active bridge handle exists. - **Missing team context safety:** `notify_team` skips with structured reason if no team context/team file is available. - **Fallback launcher safety:** `spawn_fallback_agent` fails with a structured reason when launch permissions/context are unavailable. ## Configuration Schema Reference Top-level object: ```json { "version": 1, "enabled": true, "maxChainDepth": 2, "defaultCooldownMs": 30000, "defaultDedupWindowMs": 30000, "rules": [] } ``` ### Top-Level Fields | Field | Type | Required | Notes | |---|---|---:|---| | `version` | `1` | No | Defaults to `1`. | | `enabled` | `boolean` | No | Global feature switch for this config file. | | `maxChainDepth` | `integer` | No | Global depth guard (default `2`, max `10`). | | `defaultCooldownMs` | `integer` | No | Default rule cooldown in ms (default `30000`). | | `defaultDedupWindowMs` | `integer` | No | Default action dedup window in ms (default `30000`). | | `rules` | `HookChainRule[]` | No | Defaults to `[]`. May be omitted or empty; when no rules are present, dispatch is a no-op and returns `enabled: false`. | > **Note:** An empty ruleset is valid and can be used to keep Hook Chains configured but effectively disabled until rules are added. ### Rule Object (`HookChainRule`) ```json { "id": "task-failure-recovery", "enabled": true, "trigger": { "event": "TaskCompleted", "outcome": "failed" }, "condition": { "toolNames": ["Edit"], "taskStatuses": ["failed"], "errorIncludes": ["timeout", "permission denied"], "eventFieldEquals": { "meta.source": "scheduler" } }, "cooldownMs": 60000, "dedupWindowMs": 30000, "maxDepth": 2, "actions": [] } ``` | Field | Type | Required | Notes | |---|---|---:|---| | `id` | `string` | Yes | Stable identifier used in telemetry/guards. | | `enabled` | `boolean` | No | Per-rule switch. | | `trigger.event` | `HookEvent` | Yes | Event name to match. | | `trigger.outcome` | `"success"|"failed"|"timeout"|"unknown"` | No | Single outcome matcher. | | `trigger.outcomes` | `Outcome[]` | No | Multi-outcome matcher. Use either `outcome` or `outcomes`. | | `condition` | `object` | No | Optional extra matching constraints. | | `cooldownMs` | `integer` | No | Overrides global cooldown for this rule. | | `dedupWindowMs` | `integer` | No | Overrides global dedup for this rule. | | `maxDepth` | `integer` | No | Per-rule depth cap. | | `actions` | `HookChainAction[]` | Yes | One or more actions to execute in order. | ### Condition Fields | Field | Type | Notes | |---|---|---| | `toolNames` | `string[]` | Matches `tool_name` / `toolName` in event payload. | | `taskStatuses` | `string[]` | Matches `task_status` / `taskStatus` / `status`. | | `errorIncludes` | `string[]` | Case-insensitive substring match against `error` / `reason` / `message`. | | `eventFieldEquals` | `Record` | Dot-path equality against payload (example: `"meta.source": "scheduler"`). | ### Actions #### `spawn_fallback_agent` ```json { "type": "spawn_fallback_agent", "id": "fallback-1", "enabled": true, "dedupWindowMs": 30000, "description": "Fallback recovery for failed task", "promptTemplate": "Recover task ${TASK_SUBJECT}. Event=${EVENT_NAME}, outcome=${OUTCOME}, error=${ERROR}. Payload=${PAYLOAD_JSON}", "agentType": "general-purpose", "model": "sonnet" } ``` #### `notify_team` ```json { "type": "notify_team", "id": "notify-ops", "enabled": true, "dedupWindowMs": 30000, "teamName": "mesh-team", "recipients": ["*"], "summary": "Hook chain ${RULE_ID} fired", "messageTemplate": "Event=${EVENT_NAME} outcome=${OUTCOME}\nTask=${TASK_ID}\nError=${ERROR}\nPayload=${PAYLOAD_JSON}" } ``` #### `warm_remote_capacity` ```json { "type": "warm_remote_capacity", "id": "warm-bridge", "enabled": true, "dedupWindowMs": 60000, "createDefaultEnvironmentIfMissing": false } ``` ## Complete Example Configs ### 1) Retry via Fallback Agent ```json { "version": 1, "enabled": true, "maxChainDepth": 2, "defaultCooldownMs": 30000, "defaultDedupWindowMs": 30000, "rules": [ { "id": "retry-task-via-fallback", "trigger": { "event": "TaskCompleted", "outcome": "failed" }, "cooldownMs": 60000, "actions": [ { "type": "spawn_fallback_agent", "id": "spawn-retry-agent", "description": "Retry failed task with fallback agent", "promptTemplate": "A task failed. Recover it safely.\nTask=${TASK_SUBJECT}\nDescription=${TASK_DESCRIPTION}\nError=${ERROR}\nPayload=${PAYLOAD_JSON}", "agentType": "general-purpose", "model": "sonnet" } ] } ] } ``` ### 2) Notify Only ```json { "version": 1, "enabled": true, "maxChainDepth": 2, "defaultCooldownMs": 30000, "defaultDedupWindowMs": 30000, "rules": [ { "id": "notify-on-tool-failure", "trigger": { "event": "PostToolUseFailure", "outcome": "failed" }, "condition": { "toolNames": ["Edit", "Write", "Bash"] }, "actions": [ { "type": "notify_team", "id": "notify-team-failure", "recipients": ["*"], "summary": "Tool failure detected", "messageTemplate": "Tool failure detected.\nEvent=${EVENT_NAME} outcome=${OUTCOME}\nError=${ERROR}\nPayload=${PAYLOAD_JSON}" } ] } ] } ``` ### 3) Combined Fallback + Notify + Bridge Warm ```json { "version": 1, "enabled": true, "maxChainDepth": 2, "defaultCooldownMs": 45000, "defaultDedupWindowMs": 30000, "rules": [ { "id": "full-recovery-chain", "trigger": { "event": "TaskCompleted", "outcomes": ["failed", "timeout"] }, "condition": { "errorIncludes": ["timeout", "capacity", "connection"] }, "cooldownMs": 90000, "actions": [ { "type": "spawn_fallback_agent", "id": "fallback-agent", "description": "Recover failed task execution", "promptTemplate": "Recover failed task and produce a concise fix summary.\nTask=${TASK_SUBJECT}\nError=${ERROR}\nPayload=${PAYLOAD_JSON}" }, { "type": "notify_team", "id": "notify-team", "recipients": ["*"], "summary": "Recovery chain triggered", "messageTemplate": "Recovery chain ${RULE_ID} fired.\nOutcome=${OUTCOME}\nTask=${TASK_SUBJECT}\nError=${ERROR}" }, { "type": "warm_remote_capacity", "id": "warm-capacity", "createDefaultEnvironmentIfMissing": false } ] } ] } ``` ## Template Variables The following placeholders are supported by `promptTemplate`, `summary`, and `messageTemplate`: - `${EVENT_NAME}` - `${OUTCOME}` - `${RULE_ID}` - `${TASK_SUBJECT}` - `${TASK_DESCRIPTION}` - `${TASK_ID}` - `${ERROR}` - `${PAYLOAD_JSON}` ## Troubleshooting ### Rule never triggers - Verify `trigger.event` and `trigger.outcome`/`trigger.outcomes` exactly match dispatched event data. - Check `condition` filters (especially `toolNames` and `eventFieldEquals` dot-path keys). - Confirm the config file is valid JSON and schema-valid. ### Actions show as skipped Common skip reasons: - `action disabled` - `rule cooldown active ...` - `dedup window active ...` - `max chain depth reached ...` - `No team context is available ...` - `Team file not found ...` - `Remote sessions are blocked by policy` - `Bridge is not active; warm_remote_capacity is a safe no-op` - `No fallback agent launcher is registered in runtime context` ### Config changes not reflected - Loader uses memoization by file mtime/size. - Ensure your editor writes the file fully and updates mtime. - If needed, force reload from the caller side with `forceReloadConfig: true`. ### Existing workflows changed unexpectedly - Set `"enabled": false` at top-level. - Or globally disable with `CLAUDE_CODE_ENABLE_HOOK_CHAINS=0`. - Re-enable gradually after validating one rule at a time.