Skip to content

Dual-Engine Architecture: Claude + Codex

Overview

The platform now runs a dual-engine architecture with a clear split of responsibilities:

  • Claude is the primary engine for live offensive work.
  • Codex is the primary engine for bounded support lanes and the runtime advisory engine during analysis and exploitation.

The goal is not "two models doing the same thing." The goal is to reserve the most expensive Claude reasoning for the points where it adds the most value, and move the rest of the load to Codex without reducing detection quality.

Operating Split

Area Primary Engine Secondary Engine Notes
Live pentest execution Claude Codex advisory Claude sends requests, interprets target behavior, and decides next actions
Live bug bounty execution Claude Codex advisory Claude remains the executor and final arbiter
Bounded review lanes Codex Claude fallback Static review, synthesis, clustering, ranking, post-processing
Borderline finding verification Claude Codex finding_verifier Claude keeps final decision authority
Reporting, retro, learn, threat model Codex Claude fallback Token-heavy synthesis is offloaded
Session memory and next-step generation Codex Heuristic fallback Used primarily in perpetual bug bounty workflows

Core Principle

The system follows a strict division:

  • Claude executes high-ambiguity live exploitation
  • Codex advises, critiques, clusters, ranks, compresses, and offloads bounded work

That boundary matters because live testing depends on dynamic target feedback, while bounded work depends on structured analysis and high throughput.

Why Two Engines

Single-model offensive workflows fail in two predictable ways:

  1. Reasoning convergence: after enough attempts, the same model tends to repeat the same attack pattern.
  2. Context saturation: long-running hunts accumulate too much low-value context and start wasting premium reasoning budget.

Codex breaks both failure modes:

  • it brings a different reasoning profile
  • it works well on compact bounded prompts
  • it absorbs token-heavy synthesis, triage, and memory compaction

Model Policy

Claude Models

Usage Model Reasoning
Hard pentest lane claude-opus-4-6 high
Standard pentest lane claude-opus-4-6 medium
Claude fallback lane claude-sonnet-4-6 high

Codex Models

Usage Model Reasoning
Default support lane gpt-5.4 high
Stuck-breaker / hard second opinion gpt-5.4 xhigh
Rare arbiter lane gpt-5.4-pro xhigh

Why This Split

  • claude-opus-4-6 is reserved for live offensive reasoning and final verification.
  • gpt-5.4 high is the default Codex lane for bounded work and normal advisory.
  • gpt-5.4 xhigh is reserved for hard stuck states, chain expansion, and difficult ambiguity.
  • gpt-5.4-pro xhigh is intentionally rare and only used as an arbiter when the value of another premium pass is justified.

Pentest Flow

Claude Responsibilities

During a pentest, Claude remains the live operator:

  • executes the actual tests against the target
  • chooses the next move
  • adapts payloads to live responses
  • validates whether a finding is real
  • makes the final severity and reportability call

Codex Runtime Advisory Checkpoints

Checkpoint Trigger Codex Role Typical Output
Post-route Test plan or route summary ready hypothesis_engine Orthogonal hypotheses and next tests
Mid main-testing stagnation Phase 4 stalls or signals do not improve critic Blind spots, missed assumptions, pivot suggestions
Pre-verify High-value surfaces identified but not fully closed chain_planner / critic Chain candidates and verification priorities
Borderline finding Evidence exists but verdict is not clean finding_verifier Promote, downgrade, or retest guidance
Hard stuck Repeated attempts with no useful signal stuck_breaker Three distinct attack angles

Pentest Deconfliction Rules

Scenario Action
Claude confirms, Codex agrees Finding stands at highest confidence
Claude confirms, Codex disputes Claude re-checks with the dispute in mind; human review if still ambiguous
Claude is uncertain, Codex finds a better angle Claude retries using the suggested path
Both dispute Finding is dropped
Codex unavailable Claude-only mode continues without architectural failure

Claude is always the final decision maker for live pentest outcomes.

Bug Bounty Flow

The perpetual bug bounty loop is intentionally more Codex-heavy than the pentest flow.

Claude Responsibilities

  • live interaction with the target
  • exploitation of the most promising surfaces
  • final decision on whether a lead is a real bug

Codex Primary Lanes

Lane Primary Engine Purpose
Program ranking support Codex Prioritize programs and candidate surfaces
Discovery digestion Codex Cluster surfaces, infer workflows, rank next tests
Runtime exploit support Codex Suggest payload ladders, bypasses, alternative angles
Candidate finding triage Codex Deduplicate and pre-score weak or partial signals
Session memory compaction Codex Persist compact state for the next session
Reporting and retrospectives Codex Generate token-heavy synthesis outputs

Persistent Bug Bounty Artifacts

Each run can now persist compact artifacts instead of forcing Claude to re-read raw logs:

Artifact Purpose
session-memory.json Tested surfaces, promising leads, dead ends, gaps
discovery-digest.json Surface clustering and suspicious areas
candidate-findings.json Weak signals and promoted candidates
next-tests.json Prioritized next-step suggestions

Per-session copies are stored under each program's memory/ directory, and the latest compact artifacts are also exposed at the program root for reuse by the next run.

Bug Bounty Guardrails

Bug bounty is more sensitive to noise than pentest work, so Codex support is intentionally governed by aggressive fallback:

  • invalid schema output falls back to Claude
  • low-confidence output falls back to Claude
  • repeated non-novel advice falls back to Claude
  • high-impact ambiguous findings always return to Claude for the final verdict

Token Strategy

The dual-engine design is mainly about token allocation discipline:

  • Claude is used where live reasoning quality matters most.
  • Codex is used where context compression, bounded analysis, and repeated synthesis dominate.

Practical Effects

Category Before Now
Reporting and retrospectives Claude-heavy Codex primary
Bounded code review Claude-heavy Codex primary with Claude fallback
Runtime second opinions Ad hoc Standardized consults
Bug bounty session carry-over Raw logs or human memory Compact Codex-generated artifacts

The existing P9-P15 offload alone is estimated to save roughly 110K-150K Claude tokens per engagement, before counting the new bug bounty memory and digest lanes.

Advisory Roles

The main Codex runtime roles are:

Role Purpose
hypothesis_engine Generate orthogonal test hypotheses
critic Challenge dominant assumptions and break tunnel vision
chain_planner Combine partial primitives into exploitable chains
finding_verifier Evaluate borderline findings before a final verdict
stuck_breaker Generate fresh angles when exploitation stalls

These roles do not replace Claude. They make Claude spend fewer tokens on repeated bounded reasoning.

Routing and Configuration

The architecture is enforced through:

File Purpose
.claude/skills/pentest/helpers/agent-dispatch-config.json Lane registry and Claude/Codex routing policy
scripts/model_routing_policy.py Exposes routing metadata to the runtime
.claude/skills/pentest/helpers/codex-dispatch.md Dispatch protocol and advisory contracts
.claude/skills/pentest/helpers/codex-role-contracts.md Structured role outputs
scripts/ai_exec.py AI task chains, including bug bounty Codex lanes
bugbounty/session_memory_compact.py Compact memory and digest generation

Metrics

The bug bounty runtime now records lightweight Codex effectiveness metrics:

File Purpose
bugbounty/.runtime/metrics/codex-advisory.jsonl Per-task advisory outcomes, confidence, and status
bugbounty/.runtime/metrics/codex-loop.jsonl Per-hunt artifact usage and prompt injection tracking

These metrics exist so the architecture can be tuned from real runs instead of intuition.

Compatibility

The system still degrades safely:

State Behavior
Claude + Codex available Full dual-engine workflow
Codex partially unavailable Claude continues, bounded offload is skipped
Codex fully unavailable Claude-only mode

The important point is that Codex is now structural, not optional in the design. But Claude remains sufficient to keep the workflow operational if Codex is temporarily unavailable.