Pillar 2: Guardrails

What keeps them safe.

Claude Code agents are powerful. Without boundaries, they can push to production branches, bump versions without a release workflow, leak credentials into tool calls, loop infinitely on broken commands, or quietly bloat their own instruction files. Guardrails close all of these gaps — automatically, at the hook layer, before any damage reaches your codebase.

Your fleet operates within boundaries you set.

10+ guardrails ship active by default. No configuration required. They fire silently when everything is fine, block loudly when something would go wrong, and surface informational banners for supply chain changes.

What’s new (Unreleased)

Controls toggle (bypass mode) — operator phrase disable controls flips a session-wide bypass that lifts every CI-manifesto signal gate at once: branch creation, PR creation, and force-push to non-main branches. Spawned subagents inherit it. HARD FLOOR (push-to-main, commit-on-main, secrets-in-files) stays enforced. Restored by enable controls or SessionEnd. See Guardrail 9.
Two-tier secrets — code-file writes still hard-block; inline Bash args (no file redirect) scan + log + note only. Lets agents pass a token to curl -H without breaking the workflow, while keeping durable storage locked.
Session-scoped PR signals — one signal phrase from the operator unlocks PR creation for the full session instead of expiring per-turn. PR has a 3-per-session counter.
Subagent signal isolation — subagents cannot self-arm PR signals. Only top-level operator sessions can.
gh pr create --base main required — PRs to branches other than main are now blocked. Dev work pushes directly, no PR needed.
Dependency banner — every pip/npm/cargo/... install emits a visible banner. Never blocks — surfaces supply chain additions for operator audit.
Negation-aware signals — “don’t merge to main” no longer arms the PR gate. Matchers skip phrases preceded by don't, not, never, shouldn't, won't, can't.
Shared _strip.py — strip heredocs (any delimiter), echo/printf/curl/python-c/jq quoted arguments before guards apply regex. Eliminates false-positive blocks on documentation text in command payloads.
Guards fail-closed — unexpected exceptions in branch_guard / prod_lockdown now emit a stderr warning and the outer main exits 1. Silent hook failures are no longer possible.

What’s new (Unreleased)
The guardrail pipeline
Guardrail 1: Secrets scanning
Guardrail 2: Retry circuit breaker
1. How operations are fingerprinted
2. Configuration
Guardrail 3: Branch guard
1. Blocked operations
2. What is allowed
Guardrail 4: Version guard
1. Protected files
2. Detected patterns
Guardrail 5: CLAUDE.md sanity
1. Configuration
Guardrail 6: MCP surface area
1. Configuration
Guardrail 7: Output token limit
1. Configuration
Guardrail 8: File read deduplication
1. Configuration
Guardrail 9: Controls toggle (bypass mode)
All guardrails at a glance
Exit code semantics
Hardening your setup

The guardrail pipeline

Every agent action passes through a layered defense before executing. The pipeline runs at three Claude Code hook events: UserPromptSubmit, PreToolUse, and PostToolUse.

flowchart TD
    P([User Prompt]) --> UPS

    subgraph UserPromptSubmit
        UPS[Secrets Scanner\nprompt scan]
    end

    UPS --> TOOL([Tool about to execute])

    subgraph PreToolUse["PreToolUse — security gate"]
        S[Secrets Scanner\ntool input scan]
        B[Branch Guard\ngit command filter]
        V[Version Guard\nmanifest protection]
        C[CLAUDE.md Sanity\nline limit check]
        RC[Retry Circuit Breaker\nhard block]
        FR[File Read Dedup\nredundant read block]
    end

    TOOL --> S --> B --> V --> C --> RC --> FR

    S -->|secret found| BLOCK1([BLOCKED — exit 2])
    B -->|merge/force-push/tag| BLOCK2([BLOCKED — exit 2])
    V -->|version field edit| BLOCK3([BLOCKED — exit 2])
    C -->|file exceeds line cap| BLOCK4([BLOCKED — exit 2])
    RC -->|hard max hit| BLOCK5([BLOCKED — exit 2])
    FR -->|unchanged file| BLOCK6([BLOCKED — exit 2])

    FR -->|all clear| EXEC([Tool executes])

    subgraph PostToolUse["PostToolUse — learning + enforcement"]
        RCT[Retry Circuit Breaker\nfailure tracking]
        BA[Bash Output Filter\nverbose truncation]
    end

    EXEC --> RCT
    EXEC --> BA
    RCT -->|threshold hit| WARN([Inject research instructions])
    BA -->|verbose output| TRUNC([Truncated + re-emitted])

Guardrail 1: Secrets scanning

Hook: UserPromptSubmit (warn), PreToolUse (block) Default: On — AGENTIHOOKS_SECRETS_MODE=standard

The secrets scanner intercepts credentials before they can enter tool calls, log files, or git history. It runs twice per turn: once on the raw user prompt (warn only) and once on every tool’s input parameters (block on detection).

What it detects

Pattern name	What it catches
`aws_access_key`	AKIA/ASIA/AROA/AIPA prefixed AWS key IDs
`aws_secret_key`	`aws_secret_access_key = <value>` assignments
`github_token`	`ghp_`, `ghs_`, `github_pat_` tokens
`private_key`	PEM-encoded RSA, EC, OPENSSH, PGP private keys
`bearer_token`	`Authorization: Bearer <token>` headers
`db_url_creds`	`postgres://user:pass@host`, `mysql://`, `mongodb://` URLs
`generic_secret`	`PASSWORD=`, `API_KEY=`, `SECRET=` assignments with 8+ char values

In strict mode, three additional patterns activate:

Pattern name	What it catches
`slack_token`	`xox[bpors]-...` Slack OAuth tokens
`stripe_key`	`sk_live_`, `sk_test_`, `rk_live_`, `rk_test_` keys
`jwt_token`	Three-part base64url JWTs (`eyJ...eyJ...`)

Suppression

Lines with # nosecret are excluded from scanning. Use this for documentation, test fixtures, or known-safe patterns:

example_key = "AKIAIOSFODNN7EXAMPLE"  # nosecret

Configuration

Variable	Default	Options
`AGENTIHOOKS_SECRETS_MODE`	`standard`	`off`, `warn`, `standard`, `strict`

off — scanning disabled entirely
warn — detects and warns, never blocks
standard — detects standard 7 patterns, blocks on PreToolUse
strict — standard + Slack/Stripe/JWT, blocks on PreToolUse

Guardrail 2: Retry circuit breaker

Hook: PostToolUse (track + warn), PreToolUse (hard block) Default: On — RETRY_BREAKER_ENABLED=true, soft at 5, hard at 10

Claude Code agents will retry failing operations. Without a circuit breaker, a stuck agent can hammer the same broken command dozens of times — burning context, wasting tokens, and never escaping the loop.

The retry circuit breaker tracks consecutive failures per operation. When the same operation fails repeatedly, it escalates through two stages:

Stage 1 — Soft warning (default: 5 failures): Injects a banner into Claude’s context telling it to stop retrying and instead launch parallel error-researcher agents for web search. The agent still can proceed — but it now knows it should research first.

Stage 2 — Hard block (default: 10 failures): Raises a BlockAction via PreToolUse, preventing the tool from executing at all. The agent must research and change approach before the block lifts.

How operations are fingerprinted

The breaker tracks per-operation, not per-tool. Operations are fingerprinted by their base command, with subcommand-level grouping for DevOps tools:

Tool call	Operation key
`Bash: kubectl apply -f deploy.yaml`	`bash:kubectl:apply`
`Bash: kubectl get pods`	`bash:kubectl:get`
`Bash: terraform plan`	`bash:terraform:plan`
`Bash: terraform apply`	`bash:terraform:apply`
`Bash: docker build .`	`bash:docker:build`
`Edit`	`edit`

This means kubectl apply and kubectl get have independent counters — a stuck apply doesn’t block unrelated reads.

Error text is also normalized (hex, timestamps, paths, numbers stripped) before fingerprinting, so slightly different error messages from the same root cause are treated as the same failure.

Configuration

Variable	Default	Description
`RETRY_BREAKER_ENABLED`	`true`	Master switch
`RETRY_BREAKER_MAX`	`5`	Consecutive failures before soft warning
`RETRY_BREAKER_HARD_MAX`	`10`	Consecutive failures before hard block
`RETRY_BREAKER_TTL`	`3600`	Redis key TTL in seconds (1 hour)

State is persisted in Redis per session and falls back to in-memory when Redis is unavailable.

Guardrail 3: Branch guard

Hook: PreToolUse (block, Bash commands only) Default: On — always active

Prevents destructive git operations that bypass the normal PR and release workflow. Fires on git commands before they execute.

Blocked operations

Command pattern	Reason
`git merge ... main` / `git merge ... master`	Direct merges bypass PR review
`git reset ... main` / `git reset ... master`	Rewrites protected branch history
`git push --force` / `git push -f`	Can destroy remote history
`git push --force-with-lease`	Force push variant — same risk
`git tag`	Tagging is a release operation; must go through CI

All blocked operations return exit code 2 with a human-readable explanation and, where applicable, the recommended alternative (e.g., gh workflow run release.yml for tagging).

What is allowed

Normal git push (without force flags) is allowed — branch protection is a remote concern handled by GitHub. Read-only operations (git checkout, git switch, git pull, git log, git status, git diff) are never touched.

The guard strips heredoc bodies and quoted commit messages before pattern matching to avoid false positives on message content that happens to mention “main” or “master”.

Guardrail 4: Version guard

Hook: PreToolUse (block, Edit and Write tools) Default: On — always active

Version fields in project manifests should be managed by the CI release workflow, not by an AI agent editing files directly. The version guard blocks any Edit or Write operation that would modify a version field in a known manifest file.

Protected files

pyproject.toml, package.json, Cargo.toml, setup.cfg, setup.py, version.txt, VERSION

Detected patterns

version = "1.2.3"        # pyproject.toml, Cargo.toml, setup.cfg
"version": "1.2.3"       # package.json

If a Write or Edit targets one of these files and the new content contains a version field pattern, the operation is blocked with a message directing to the release workflow:

BLOCKED: Version field modification in pyproject.toml is not allowed.
Version bumping is handled by the release workflow
(gh workflow run release.yml -f bump=patch|minor|major).
Do not edit version fields manually.

Guardrail 5: CLAUDE.md sanity

Hook: PreToolUse (block, Edit and Write tools) Default: On — AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK=true

CLAUDE.md is loaded by Claude Code on every turn. Every line in it costs tokens, every session. Agents writing unchecked content into CLAUDE.md files can quietly inflate the per-turn base cost while also degrading instruction quality through drift and dilution.

The sanity check intercepts any Write or Edit targeting a CLAUDE.md or CLAUDE.local.md file and simulates what the resulting file would look like. If the resulting line count exceeds the cap, the operation is blocked before the file is touched:

BLOCKED: Write to /home/user/.claude/CLAUDE.md would produce 347 lines,
exceeding the CLAUDE.md cap of 200 lines.
Trim the content to 200 lines or fewer before writing.

For Edit operations, the guard reads the current file from disk, applies the proposed change in memory, and counts the resulting lines — catching incremental bloat that would accumulate over many small edits.

Configuration

Variable	Default	Description
`AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK`	`true`	Enable/disable the guardrail
`AGENTIHOOKS_CLAUDE_MD_MAXLINES`	`200`	Maximum allowed lines in CLAUDE.md files

Guardrail 6: MCP surface area

Hook: SessionStart (warn) Default: On — MCP_TOOL_WARN_THRESHOLD=40

Every MCP tool schema loaded into a session costs tokens — approximately 150 tokens per tool, every turn. With 9 MCP servers and 112 tools, that’s ~16,800 schema tokens injected into every single context turn before Claude writes a line of code.

At session start, agentihooks counts the total number of MCP tools across all configured servers. If the count exceeds MCP_TOOL_WARN_THRESHOLD (default 40), a warning banner fires:

MCP SURFACE AREA: 112 tools across 9 servers (~16,800 schema tokens/turn).
Consider disabling unused servers via /mcp to reduce per-turn overhead.

The companion CLI command provides the full breakdown:

agentihooks mcp report

MCP Surface Area Report
Total: 9 servers, ~112 tools, ~16,800 schema tokens

Server                         Source   Tools   ~Tokens
hooks-utils                      user      32     4,800
github                           user      40     6,000
postgres                         user      15     2,250
...

Configuration

Variable	Default	Description
`MCP_TOOL_WARN_THRESHOLD`	`40`	Tool count threshold for session-start warning
`MCP_SCHEMA_AVG_TOKENS`	`150`	Estimated tokens per tool schema

Guardrail 7: Output token limit

Hook: SessionStart (inject awareness) Default: Passive — activates when CLAUDE_CODE_MAX_OUTPUT_TOKENS is set

When CLAUDE_CODE_MAX_OUTPUT_TOKENS is set in the environment, agentihooks injects an awareness message into the session context at startup so Claude knows the limit is in effect and can plan accordingly:

OUTPUT TOKEN LIMIT: This session is capped at 8192 output tokens per response.
Plan responses to stay within this limit.

Without this injection, Claude can unknowingly start generating a response that will be cut off mid-output by the token limit, resulting in truncated tool calls, incomplete edits, or garbled output. The awareness injection prevents the surprise.

Configuration

Variable	Default	Description
`CLAUDE_CODE_MAX_OUTPUT_TOKENS`	(unset)	Output token cap; awareness injected when set

Guardrail 8: File read deduplication

Hook: PreToolUse (block redundant reads), PostToolUse (cache on read) Default: On — FILE_READ_CACHE_ENABLED=true

Tracks every file read during a session. If Claude tries to read the same file again and the file has not changed on disk since the last read, the operation is blocked:

BLOCKED: /path/to/file.py was already read this session and is unchanged on disk.
Use the content already in your context window.

This prevents a common and expensive pattern: Claude re-reading the same source file 3–5 times over the course of a long session, each read injecting the same content into the context window.

File identity is tracked by path + mtime. A file modified on disk since the last read passes through normally, and the cache entry is updated.

Configuration

Variable	Default	Description
`FILE_READ_CACHE_ENABLED`	`true`	Master switch
`FILE_READ_CACHE_BACKEND`	`redis`	`redis` or `memory`
`FILE_READ_CACHE_TTL`	`21600`	Redis key TTL (6 hours)

Guardrail 9: Controls toggle (bypass mode)

Hook: UserPromptSubmit (signal detection), PreToolUse (consumed by branch_guard + prod_lockdown), SessionEnd (cleanup) Default: On — CONTROLS_BYPASS_ENABLED=true

A session-level escape hatch the operator activates by saying disable controls (or turn off controls, deactivate controls, kill controls). While active, every CI-manifesto signal gate is short-circuited at once. Restored by enable controls (or turn on, activate, restore) or by SessionEnd of the activating session.

What it unlocks

While bypass mode is ACTIVE:

Branch creation (CI Manifesto §13) — git checkout -b, git switch -c, git branch <name>
PR creation (§14) — gh pr create with no per-session counter cap
Release-gate merges (§4) — gh pr merge to main/master
Release workflow trigger — gh workflow run release.yml
Hotfix-category prod ops (§5) — :latest/:prod/:stable image push, tag, build
Force push to non-main branches — git push --force, -f, --force-with-lease

What stays HARD FLOOR

These remain blocked even with controls disabled — no toggle reaches them:

Direct git push to main/master (also blocked at OS layer + GitHub ruleset)
Force push to main/master (caught by the main-push pattern that runs first)
git commit while HEAD is on main/master
--base main requirement on gh pr create (correctness, not permission)
git reset on main, git tag, git branch -D main
Secrets written to code files (Write/Edit/Bash-redirect into a file)

Subagent inheritance

The flag is global (single file at ~/.agentihooks/controls_flags/active.flag + Redis key controls_disabled:_global storing the owner session ID). Any subagent spawned during the bypass window inherits the unlock without per-session signaling. SessionEnd of the parent session clears it — child SessionEnds are no-ops, so a long-running parent doesn’t lose its bypass when a subagent ends.

Operator phrases

Phrase	Effect
`disable controls`, `turn off controls`, `deactivate controls`, `kill controls`	Activate bypass
`enable controls`, `turn on controls`, `activate controls`, `restore controls`	Restore gates

A CONTROLS banner is injected on every transition and on each turn while bypass is active, so the operator (and the agent) can never silently forget the gates are down.

Configuration

Variable	Default	Description
`CONTROLS_BYPASS_ENABLED`	`true`	Master switch — set `false` to disable the feature entirely

When the operator is likely to invoke it

Multi-step main-targeted work where re-signaling each gate would be friction
Spawning a fleet of subagents to open multiple PRs in parallel
Hotfix sequences where the operator has already validated and just wants speed
Force-push to a feature branch after a rebase

Agents must NOT volunteer the toggle (“want me to disable controls?”). The operator decides; the agent doesn’t suggest the escape hatch.

All guardrails at a glance

#	Guardrail	Hook event	Default	Block or warn	Config key
1	Secrets scanner	`UserPromptSubmit` / `PreToolUse`	On	Warn / Block	`AGENTIHOOKS_SECRETS_MODE`
2	Retry circuit breaker	`PostToolUse` / `PreToolUse`	On	Warn → Block	`RETRY_BREAKER_ENABLED`
3	Branch guard	`PreToolUse`	On	Block	(always on)
4	Version guard	`PreToolUse`	On	Block	(always on)
5	CLAUDE.md sanity	`PreToolUse`	On	Block	`AGENTIHOOKS_CLAUDE_MD_SANITY_CHECK`
6	MCP surface area	`SessionStart`	On	Warn	`MCP_TOOL_WARN_THRESHOLD`
7	Output token limit	`SessionStart`	Passive	Inject awareness	`CLAUDE_CODE_MAX_OUTPUT_TOKENS`
8	File read dedup	`PreToolUse` / `PostToolUse`	On	Block	`FILE_READ_CACHE_ENABLED`
9	Controls toggle (bypass)	`UserPromptSubmit` / `PreToolUse` / `SessionEnd`	On	Operator-toggled	`CONTROLS_BYPASS_ENABLED`

Exit code semantics

Guardrails that block communicate via exit code 2. Claude Code cancels the tool action and displays the hook’s stderr output as a warning to the agent.

Exit code	Meaning
`0`	Allow — tool proceeds
`2`	Block — tool cancelled, stderr shown as warning

This is the same mechanism Claude Code uses for all hook blocks — guardrails are first-class citizens of the hook event pipeline.

Hardening your setup

The defaults cover the most common failure modes. For teams or higher-stakes environments, consider these adjustments:

# ~/.agentihooks/.env

# Stricter secrets scanning (adds Slack, Stripe, JWT patterns)
AGENTIHOOKS_SECRETS_MODE=strict

# Tighter retry tolerance (warn sooner, hard block sooner)
RETRY_BREAKER_MAX=3
RETRY_BREAKER_HARD_MAX=7

# Tighter CLAUDE.md size cap
AGENTIHOOKS_CLAUDE_MD_MAXLINES=150

# Warn earlier on MCP surface area
MCP_TOOL_WARN_THRESHOLD=25

# Set output token ceiling
CLAUDE_CODE_MAX_OUTPUT_TOKENS=8192

To verify all guardrails are active after installation:

agentihooks status

The status output lists each guardrail with its current state (enabled/disabled), mode, and thresholds. Inside a live session, /agentihooks shows the same panel alongside real-time session metrics.

Pillar 2: Guardrails

What’s new (Unreleased)

Table of contents

The guardrail pipeline

Guardrail 1: Secrets scanning

What it detects

Suppression

Configuration

Guardrail 2: Retry circuit breaker

How operations are fingerprinted

Configuration

Guardrail 3: Branch guard

Blocked operations

What is allowed

Guardrail 4: Version guard

Protected files

Detected patterns

Guardrail 5: CLAUDE.md sanity

Configuration

Guardrail 6: MCP surface area

Configuration

Guardrail 7: Output token limit

Configuration

Guardrail 8: File read deduplication

Configuration

Guardrail 9: Controls toggle (bypass mode)

What it unlocks

What stays HARD FLOOR

Subagent inheritance

Operator phrases

Configuration

When the operator is likely to invoke it

All guardrails at a glance

Exit code semantics

Hardening your setup