Token Optimization¶
Token Optimization Pipeline¶
graph LR
INPUT["Input Context<br/>100K tokens"]
OPT1["Context Fork<br/>Heavy Skills<br/>Isolated Subagents"]
OPT2["Wave Dispatch<br/>3 agents/wave<br/>Isolated contexts"]
OPT3["File Piping<br/>No inline<br/>Results → logs/"]
OPT4["Output Limits<br/>--max-time<br/>head -N"]
OPT5["Model Routing<br/>Opus/Sonnet/Haiku<br/>Per-scope"]
OPT6["Smart Route<br/>/route skip<br/>Irrelevant tests"]
INPUT --> OPT1
OPT1 --> OPT2
OPT2 --> OPT3
OPT3 --> OPT4
OPT4 --> OPT5
OPT5 --> OPT6
OPT6 --> OUTPUT["Optimized Cost<br/>2-2.5x baseline<br/>6-8 scans/week"]
style INPUT fill:#9b30ff,color:#fff,stroke:#00e5ff,stroke-width:2px
style OPT1 fill:#4a148c,color:#fff
style OPT2 fill:#6a1b9a,color:#fff
style OPT3 fill:#7b1fa2,color:#fff
style OPT4 fill:#8e24aa,color:#fff
style OPT5 fill:#9c27b0,color:#fff
style OPT6 fill:#ab47bc,color:#fff
style OUTPUT fill:#0277bd,color:#fff,stroke:#00e5ff,stroke-width:2px
Strategies¶
Context Fork¶
Use context: fork on heavy skills to isolate them into subagents, preventing context window bloat.
Wave Dispatch¶
Each claude -p agent has an isolated context — no bias accumulation from previous test results.
File Piping¶
Pipe results to files (> logs/output.txt), never inline. Prevents large tool outputs from consuming context.
Output Limits¶
--max-timeon every curl callhead -Non large outputs- Skip inapplicable tests via
/routetest-plan.json
Model Routing¶
Per-scope model assignment ensures expensive Opus tokens are only used where creative reasoning adds value. Passive checks use cheaper Sonnet/Haiku.
1M Context Window Optimizations¶
With Opus 4.6 and Sonnet 4.6 upgraded to 1M context (5x from 200K), the suite leverages deeper reasoning per agent:
| Parameter | Previous (200K) | Current (1M) | Rationale |
|---|---|---|---|
| Thinking budget (HIGH) | 10-16K | 16-24K | Lab data: 12K → 82% SQLi detection, 20K projected ~87-88% |
| Thinking budget (MED) | 5-8K | 8-14K | More reasoning for structured tests |
| Thinking budget (Sonnet) | 3-5K | 6-10K | Sonnet now has 1M context, can benefit from deeper analysis |
| Max-turns per agent | 150 | 250 | Agents explore more deeply before context exhaustion |
| Discover subprocess turns | 200 | 300 | Discovery phase benefits most from extended exploration |
| Scan subprocess turns | 100 | 150 | More room for complex scan output parsing |
| L3 endpoint split threshold | 12 | 20 | Single agent with 1M can maintain correlation across more endpoints |
| L3 split bands | <=24→2, >24→3 | <=40→2, >40→3 | Fewer splits = less init overhead, better cross-endpoint reasoning |
| Body cap (fingerprinting) | 50KB | 100KB | SPAs with inline JS often exceed 50KB |
| Tier 1 model | Haiku | Sonnet | Aligned with model map; Sonnet at 1M handles crypto/supply-chain better |
| Cross-wave propagation | Basic summary | Evidence hints + params + full endpoint context | Richer inter-wave intelligence with 1M headroom |
What didn't change: wave architecture (3 agents/wave, stealth-driven), request budgets (500/skill, target protection), JITTER_MULT, kill switch timeouts, file piping pattern, Haiku thinking (2000 — deterministic tasks).
Impact Summary¶
| Strategy | Token Savings |
|---|---|
| Context fork | Prevents 50K+ token accumulation per heavy skill |
| File piping | Avoids 10-50K tokens per large response |
| Model routing | 40-60% cost reduction on passive/deterministic tasks |
| /route skip | Eliminates entire test categories per endpoint |