Skip to content

Token Optimization

Token Optimization Pipeline

graph LR
    INPUT["Input Context<br/>100K tokens"]

    OPT1["Context Fork<br/>Heavy Skills<br/>Isolated Subagents"]
    OPT2["Wave Dispatch<br/>3 agents/wave<br/>Isolated contexts"]
    OPT3["File Piping<br/>No inline<br/>Results → logs/"]
    OPT4["Output Limits<br/>--max-time<br/>head -N"]
    OPT5["Model Routing<br/>Opus/Sonnet/Haiku<br/>Per-scope"]
    OPT6["Smart Route<br/>/route skip<br/>Irrelevant tests"]

    INPUT --> OPT1
    OPT1 --> OPT2
    OPT2 --> OPT3
    OPT3 --> OPT4
    OPT4 --> OPT5
    OPT5 --> OPT6

    OPT6 --> OUTPUT["Optimized Cost<br/>2-2.5x baseline<br/>6-8 scans/week"]

    style INPUT fill:#9b30ff,color:#fff,stroke:#00e5ff,stroke-width:2px
    style OPT1 fill:#4a148c,color:#fff
    style OPT2 fill:#6a1b9a,color:#fff
    style OPT3 fill:#7b1fa2,color:#fff
    style OPT4 fill:#8e24aa,color:#fff
    style OPT5 fill:#9c27b0,color:#fff
    style OPT6 fill:#ab47bc,color:#fff
    style OUTPUT fill:#0277bd,color:#fff,stroke:#00e5ff,stroke-width:2px

Strategies

Context Fork

Use context: fork on heavy skills to isolate them into subagents, preventing context window bloat.

Wave Dispatch

Each claude -p agent has an isolated context — no bias accumulation from previous test results.

File Piping

Pipe results to files (> logs/output.txt), never inline. Prevents large tool outputs from consuming context.

Output Limits

  • --max-time on every curl call
  • head -N on large outputs
  • Skip inapplicable tests via /route test-plan.json

Model Routing

Per-scope model assignment ensures expensive Opus tokens are only used where creative reasoning adds value. Passive checks use cheaper Sonnet/Haiku.

1M Context Window Optimizations

With Opus 4.6 and Sonnet 4.6 upgraded to 1M context (5x from 200K), the suite leverages deeper reasoning per agent:

Parameter Previous (200K) Current (1M) Rationale
Thinking budget (HIGH) 10-16K 16-24K Lab data: 12K → 82% SQLi detection, 20K projected ~87-88%
Thinking budget (MED) 5-8K 8-14K More reasoning for structured tests
Thinking budget (Sonnet) 3-5K 6-10K Sonnet now has 1M context, can benefit from deeper analysis
Max-turns per agent 150 250 Agents explore more deeply before context exhaustion
Discover subprocess turns 200 300 Discovery phase benefits most from extended exploration
Scan subprocess turns 100 150 More room for complex scan output parsing
L3 endpoint split threshold 12 20 Single agent with 1M can maintain correlation across more endpoints
L3 split bands <=24→2, >24→3 <=40→2, >40→3 Fewer splits = less init overhead, better cross-endpoint reasoning
Body cap (fingerprinting) 50KB 100KB SPAs with inline JS often exceed 50KB
Tier 1 model Haiku Sonnet Aligned with model map; Sonnet at 1M handles crypto/supply-chain better
Cross-wave propagation Basic summary Evidence hints + params + full endpoint context Richer inter-wave intelligence with 1M headroom

What didn't change: wave architecture (3 agents/wave, stealth-driven), request budgets (500/skill, target protection), JITTER_MULT, kill switch timeouts, file piping pattern, Haiku thinking (2000 — deterministic tasks).

Impact Summary

Strategy Token Savings
Context fork Prevents 50K+ token accumulation per heavy skill
File piping Avoids 10-50K tokens per large response
Model routing 40-60% cost reduction on passive/deterministic tasks
/route skip Eliminates entire test categories per endpoint