Model Routing¶
Overview¶
Model routing now covers both Claude and Codex. The routing policy decides:
- which engine is primary for a lane
- which model is used
- which reasoning level is used
- when the system falls back to Claude
The routing policy is defined in the lane registry and exposed to runtime helpers through scripts/model_routing_policy.py.
Routing Goals¶
The routing design follows four rules:
- Use Claude for high-ambiguity live offensive work.
- Use Codex for bounded, structured, token-heavy, and repeatable lanes.
- Escalate to higher reasoning only when it materially improves outcomes.
- Fall back to Claude aggressively when Codex output is weak, invalid, or stale.
Claude Routing¶
Claude Model Classes¶
| Class | Model | Reasoning | Typical Use |
|---|---|---|---|
| Hard lane | claude-opus-4-6 |
high |
SQLi, XSS, OAuth, race conditions, live verification |
| Standard lane | claude-opus-4-6 |
medium |
Structured exploitation and general testing |
| Fallback lane | claude-sonnet-4-6 |
high |
Lower-cost fallback or procedural tasks |
Claude Skill Map¶
| Skill Family | Typical Model | Notes |
|---|---|---|
route, verify, chain |
claude-opus-4-6 |
Highest-value reasoning paths |
| Deep exploit skills | claude-opus-4-6 |
High ambiguity and live adaptation |
| Structured exploit skills | claude-opus-4-6 |
Medium reasoning by default |
| Passive/deterministic checks | claude-sonnet-4-6 |
Used when deep reasoning is not necessary |
Claude remains the live execution engine for pentest and bug bounty.
Codex Routing¶
Codex Model Classes¶
| Class | Model | Reasoning | Typical Use |
|---|---|---|---|
| Support lane | gpt-5.4 |
high |
Synthesis, review, clustering, advisory |
| Stuck lane | gpt-5.4 |
xhigh |
Hard stuck states, hard second opinion, chain expansion |
| Arbiter lane | gpt-5.4-pro |
xhigh |
Rare high-impact conflicts |
Codex Role Map¶
| Role | Default Model | Reasoning |
|---|---|---|
hypothesis_engine |
gpt-5.4 |
high |
critic |
gpt-5.4 |
high |
chain_planner |
gpt-5.4 |
xhigh when needed |
finding_verifier |
gpt-5.4 |
high |
stuck_breaker |
gpt-5.4 |
xhigh |
| Rare arbiter | gpt-5.4-pro |
xhigh |
Lane Routing¶
The lane registry in .claude/skills/pentest/helpers/agent-dispatch-config.json defines the primary engine and fallback engine for each operational mode.
Pentest Lanes¶
| Lane | Primary Engine | Fallback | Notes |
|---|---|---|---|
| Live execution | Claude | None | Requests against the target are executed by Claude |
| Post-route advisory | Codex | Claude | hypothesis_engine |
| Mid-test stagnation advisory | Codex | Claude | critic |
| Pre-verify advisory | Codex | Claude | chain_planner / critic |
| Borderline verification | Codex | Claude | finding_verifier before final verdict |
| Hard stuck | Codex | Claude | stuck_breaker with xhigh |
| Static review/reporting | Codex | Claude | Bounded and synthesis-heavy lanes |
Bug Bounty Lanes¶
| Lane | Primary Engine | Fallback | Notes |
|---|---|---|---|
| Live hunt execution | Claude | None | Claude interacts with the target |
| Program ranking support | Codex | Claude | Candidate program and surface prioritization |
| Discovery digestion | Codex | Claude | Cluster and rank next steps |
| Runtime exploit support | Codex | Claude | Payload ladders and alternative angles |
| Candidate finding triage | Codex | Claude | Deduplicate and pre-score leads |
| Session memory compaction | Codex | Heuristic | Persist compact state for the next session |
| Reporting and retros | Codex | Claude | Bounded synthesis |
Runtime Metadata¶
The routing policy exports Codex-specific metadata so shells and Python helpers can stay in sync:
| Field | Purpose |
|---|---|
codex_mode |
Operational profile such as review-only or bug-bounty-heavy |
codex_primary_engine |
Primary engine for the lane |
codex_fallback_engine |
Fallback engine for the lane |
codex_support_model |
Default Codex model for support lanes |
codex_stuck_model |
Codex model for stuck-breaking |
codex_arbiter_model |
Rare arbiter model |
codex_advisory_roles_csv |
Enabled advisory roles |
codex_confidence_threshold |
Confidence floor before fallback |
These values are consumed by runtime scripts and by the bug bounty shell loop.
AI Task Chains¶
The Python runtime exposes dedicated Codex task chains for bounded work:
| Task Chain | Purpose |
|---|---|
bb-program-ranking |
Program and target prioritization |
bb-discovery-digest |
Compact synthesis of discovery outputs |
bb-runtime-advisory |
Runtime exploit support |
bb-stuck-breaker |
High-effort stuck resolution |
bb-memory-compaction |
Compact persistent session memory |
These chains are primarily implemented through scripts/ai_exec.py.
Fallback Policy¶
Codex is heavily used, especially in bug bounty mode, but fallback rules are strict.
Automatic Fallback Conditions¶
| Condition | Result |
|---|---|
| Invalid schema | Fallback to Claude |
| Confidence below threshold | Fallback to Claude |
| Repeated stale advice | Fallback to Claude |
| Contradiction with local evidence | Fallback to Claude |
| High-impact ambiguity | Claude retains the final call |
Practical Thresholds¶
| Lane Type | Typical Threshold |
|---|---|
| Static review | 70 |
| Artifact synthesis | 60 |
| Batch triage | 55 |
| Runtime advisory | 65 |
| Stuck-breaker | 70 |
These thresholds are intentionally conservative in bug bounty mode.
Token Allocation Strategy¶
The routing policy is also a token policy:
- Claude tokens are reserved for live target interaction and final decisions.
- Codex tokens are spent on bounded analysis, synthesis, compression, and structured second opinions.
Practical Effects¶
| Category | Token Strategy |
|---|---|
| Live exploitation | Prefer Claude Opus |
| Bounded review and synthesis | Prefer Codex |
| Long-running session carry-over | Prefer Codex compact artifacts |
| Hard conflict resolution | Use higher-tier Codex only when justified |
This keeps the expensive Claude context focused on the high-value part of the engagement.