Model Routing¶

Overview¶

Model routing now covers both Claude and Codex. The routing policy decides:

which engine is primary for a lane
which model is used
which reasoning level is used
when the system falls back to Claude

The routing policy is defined in the lane registry and exposed to runtime helpers through scripts/model_routing_policy.py.

Routing Goals¶

The routing design follows four rules:

Use Claude for high-ambiguity live offensive work.
Use Codex for bounded, structured, token-heavy, and repeatable lanes.
Escalate to higher reasoning only when it materially improves outcomes.
Fall back to Claude aggressively when Codex output is weak, invalid, or stale.

Claude Routing¶

Claude Model Classes¶

Class	Model	Reasoning	Typical Use
Hard lane	`claude-opus-4-6`	`high`	SQLi, XSS, OAuth, race conditions, live verification
Standard lane	`claude-opus-4-6`	`medium`	Structured exploitation and general testing
Fallback lane	`claude-sonnet-4-6`	`high`	Lower-cost fallback or procedural tasks

Claude Skill Map¶

Skill Family	Typical Model	Notes
`route`, `verify`, `chain`	`claude-opus-4-6`	Highest-value reasoning paths
Deep exploit skills	`claude-opus-4-6`	High ambiguity and live adaptation
Structured exploit skills	`claude-opus-4-6`	Medium reasoning by default
Passive/deterministic checks	`claude-sonnet-4-6`	Used when deep reasoning is not necessary

Claude remains the live execution engine for pentest and bug bounty.

Codex Routing¶

Codex Model Classes¶

Class	Model	Reasoning	Typical Use
Support lane	`gpt-5.4`	`high`	Synthesis, review, clustering, advisory
Stuck lane	`gpt-5.4`	`xhigh`	Hard stuck states, hard second opinion, chain expansion
Arbiter lane	`gpt-5.4-pro`	`xhigh`	Rare high-impact conflicts

Codex Role Map¶

Role	Default Model	Reasoning
`hypothesis_engine`	`gpt-5.4`	`high`
`critic`	`gpt-5.4`	`high`
`chain_planner`	`gpt-5.4`	`xhigh` when needed
`finding_verifier`	`gpt-5.4`	`high`
`stuck_breaker`	`gpt-5.4`	`xhigh`
Rare arbiter	`gpt-5.4-pro`	`xhigh`

Lane Routing¶

The lane registry in .claude/skills/pentest/helpers/agent-dispatch-config.json defines the primary engine and fallback engine for each operational mode.

Pentest Lanes¶

Lane	Primary Engine	Fallback	Notes
Live execution	Claude	None	Requests against the target are executed by Claude
Post-route advisory	Codex	Claude	`hypothesis_engine`
Mid-test stagnation advisory	Codex	Claude	`critic`
Pre-verify advisory	Codex	Claude	`chain_planner` / `critic`
Borderline verification	Codex	Claude	`finding_verifier` before final verdict
Hard stuck	Codex	Claude	`stuck_breaker` with `xhigh`
Static review/reporting	Codex	Claude	Bounded and synthesis-heavy lanes

Bug Bounty Lanes¶

Lane	Primary Engine	Fallback	Notes
Live hunt execution	Claude	None	Claude interacts with the target
Program ranking support	Codex	Claude	Candidate program and surface prioritization
Discovery digestion	Codex	Claude	Cluster and rank next steps
Runtime exploit support	Codex	Claude	Payload ladders and alternative angles
Candidate finding triage	Codex	Claude	Deduplicate and pre-score leads
Session memory compaction	Codex	Heuristic	Persist compact state for the next session
Reporting and retros	Codex	Claude	Bounded synthesis

Runtime Metadata¶

The routing policy exports Codex-specific metadata so shells and Python helpers can stay in sync:

Field	Purpose
`codex_mode`	Operational profile such as review-only or bug-bounty-heavy
`codex_primary_engine`	Primary engine for the lane
`codex_fallback_engine`	Fallback engine for the lane
`codex_support_model`	Default Codex model for support lanes
`codex_stuck_model`	Codex model for stuck-breaking
`codex_arbiter_model`	Rare arbiter model
`codex_advisory_roles_csv`	Enabled advisory roles
`codex_confidence_threshold`	Confidence floor before fallback

These values are consumed by runtime scripts and by the bug bounty shell loop.

AI Task Chains¶

The Python runtime exposes dedicated Codex task chains for bounded work:

Task Chain	Purpose
`bb-program-ranking`	Program and target prioritization
`bb-discovery-digest`	Compact synthesis of discovery outputs
`bb-runtime-advisory`	Runtime exploit support
`bb-stuck-breaker`	High-effort stuck resolution
`bb-memory-compaction`	Compact persistent session memory

These chains are primarily implemented through scripts/ai_exec.py.

Fallback Policy¶

Codex is heavily used, especially in bug bounty mode, but fallback rules are strict.

Automatic Fallback Conditions¶

Condition	Result
Invalid schema	Fallback to Claude
Confidence below threshold	Fallback to Claude
Repeated stale advice	Fallback to Claude
Contradiction with local evidence	Fallback to Claude
High-impact ambiguity	Claude retains the final call

Practical Thresholds¶

Lane Type	Typical Threshold
Static review	`70`
Artifact synthesis	`60`
Batch triage	`55`
Runtime advisory	`65`
Stuck-breaker	`70`

These thresholds are intentionally conservative in bug bounty mode.

Token Allocation Strategy¶

The routing policy is also a token policy:

Claude tokens are reserved for live target interaction and final decisions.
Codex tokens are spent on bounded analysis, synthesis, compression, and structured second opinions.

Practical Effects¶

Category	Token Strategy
Live exploitation	Prefer Claude Opus
Bounded review and synthesis	Prefer Codex
Long-running session carry-over	Prefer Codex compact artifacts
Hard conflict resolution	Use higher-tier Codex only when justified

This keeps the expensive Claude context focused on the high-value part of the engagement.