Skip to content

Model Routing

Overview

Model routing now covers both Claude and Codex. The routing policy decides:

  • which engine is primary for a lane
  • which model is used
  • which reasoning level is used
  • when the system falls back to Claude

The routing policy is defined in the lane registry and exposed to runtime helpers through scripts/model_routing_policy.py.

Routing Goals

The routing design follows four rules:

  1. Use Claude for high-ambiguity live offensive work.
  2. Use Codex for bounded, structured, token-heavy, and repeatable lanes.
  3. Escalate to higher reasoning only when it materially improves outcomes.
  4. Fall back to Claude aggressively when Codex output is weak, invalid, or stale.

Claude Routing

Claude Model Classes

Class Model Reasoning Typical Use
Hard lane claude-opus-4-6 high SQLi, XSS, OAuth, race conditions, live verification
Standard lane claude-opus-4-6 medium Structured exploitation and general testing
Fallback lane claude-sonnet-4-6 high Lower-cost fallback or procedural tasks

Claude Skill Map

Skill Family Typical Model Notes
route, verify, chain claude-opus-4-6 Highest-value reasoning paths
Deep exploit skills claude-opus-4-6 High ambiguity and live adaptation
Structured exploit skills claude-opus-4-6 Medium reasoning by default
Passive/deterministic checks claude-sonnet-4-6 Used when deep reasoning is not necessary

Claude remains the live execution engine for pentest and bug bounty.

Codex Routing

Codex Model Classes

Class Model Reasoning Typical Use
Support lane gpt-5.4 high Synthesis, review, clustering, advisory
Stuck lane gpt-5.4 xhigh Hard stuck states, hard second opinion, chain expansion
Arbiter lane gpt-5.4-pro xhigh Rare high-impact conflicts

Codex Role Map

Role Default Model Reasoning
hypothesis_engine gpt-5.4 high
critic gpt-5.4 high
chain_planner gpt-5.4 xhigh when needed
finding_verifier gpt-5.4 high
stuck_breaker gpt-5.4 xhigh
Rare arbiter gpt-5.4-pro xhigh

Lane Routing

The lane registry in .claude/skills/pentest/helpers/agent-dispatch-config.json defines the primary engine and fallback engine for each operational mode.

Pentest Lanes

Lane Primary Engine Fallback Notes
Live execution Claude None Requests against the target are executed by Claude
Post-route advisory Codex Claude hypothesis_engine
Mid-test stagnation advisory Codex Claude critic
Pre-verify advisory Codex Claude chain_planner / critic
Borderline verification Codex Claude finding_verifier before final verdict
Hard stuck Codex Claude stuck_breaker with xhigh
Static review/reporting Codex Claude Bounded and synthesis-heavy lanes

Bug Bounty Lanes

Lane Primary Engine Fallback Notes
Live hunt execution Claude None Claude interacts with the target
Program ranking support Codex Claude Candidate program and surface prioritization
Discovery digestion Codex Claude Cluster and rank next steps
Runtime exploit support Codex Claude Payload ladders and alternative angles
Candidate finding triage Codex Claude Deduplicate and pre-score leads
Session memory compaction Codex Heuristic Persist compact state for the next session
Reporting and retros Codex Claude Bounded synthesis

Runtime Metadata

The routing policy exports Codex-specific metadata so shells and Python helpers can stay in sync:

Field Purpose
codex_mode Operational profile such as review-only or bug-bounty-heavy
codex_primary_engine Primary engine for the lane
codex_fallback_engine Fallback engine for the lane
codex_support_model Default Codex model for support lanes
codex_stuck_model Codex model for stuck-breaking
codex_arbiter_model Rare arbiter model
codex_advisory_roles_csv Enabled advisory roles
codex_confidence_threshold Confidence floor before fallback

These values are consumed by runtime scripts and by the bug bounty shell loop.

AI Task Chains

The Python runtime exposes dedicated Codex task chains for bounded work:

Task Chain Purpose
bb-program-ranking Program and target prioritization
bb-discovery-digest Compact synthesis of discovery outputs
bb-runtime-advisory Runtime exploit support
bb-stuck-breaker High-effort stuck resolution
bb-memory-compaction Compact persistent session memory

These chains are primarily implemented through scripts/ai_exec.py.

Fallback Policy

Codex is heavily used, especially in bug bounty mode, but fallback rules are strict.

Automatic Fallback Conditions

Condition Result
Invalid schema Fallback to Claude
Confidence below threshold Fallback to Claude
Repeated stale advice Fallback to Claude
Contradiction with local evidence Fallback to Claude
High-impact ambiguity Claude retains the final call

Practical Thresholds

Lane Type Typical Threshold
Static review 70
Artifact synthesis 60
Batch triage 55
Runtime advisory 65
Stuck-breaker 70

These thresholds are intentionally conservative in bug bounty mode.

Token Allocation Strategy

The routing policy is also a token policy:

  • Claude tokens are reserved for live target interaction and final decisions.
  • Codex tokens are spent on bounded analysis, synthesis, compression, and structured second opinions.

Practical Effects

Category Token Strategy
Live exploitation Prefer Claude Opus
Bounded review and synthesis Prefer Codex
Long-running session carry-over Prefer Codex compact artifacts
Hard conflict resolution Use higher-tier Codex only when justified

This keeps the expensive Claude context focused on the high-value part of the engagement.