Token Optimization¶

Token Optimization Pipeline¶

graph LR
    INPUT["Input Context<br/>100K tokens"]

    OPT1["Context Fork<br/>Heavy Skills<br/>Isolated Subagents"]
    OPT2["Wave Dispatch<br/>3 agents/wave<br/>Isolated contexts"]
    OPT3["File Piping<br/>No inline<br/>Results → logs/"]
    OPT4["Output Limits<br/>--max-time<br/>head -N"]
    OPT5["Model Routing<br/>Opus/Sonnet/Haiku<br/>Per-scope"]
    OPT6["Smart Route<br/>/route skip<br/>Irrelevant tests"]

    INPUT --> OPT1
    OPT1 --> OPT2
    OPT2 --> OPT3
    OPT3 --> OPT4
    OPT4 --> OPT5
    OPT5 --> OPT6

    OPT6 --> OUTPUT["Optimized Cost<br/>2-2.5x baseline<br/>6-8 scans/week"]

    style INPUT fill:#9b30ff,color:#fff,stroke:#00e5ff,stroke-width:2px
    style OPT1 fill:#4a148c,color:#fff
    style OPT2 fill:#6a1b9a,color:#fff
    style OPT3 fill:#7b1fa2,color:#fff
    style OPT4 fill:#8e24aa,color:#fff
    style OPT5 fill:#9c27b0,color:#fff
    style OPT6 fill:#ab47bc,color:#fff
    style OUTPUT fill:#0277bd,color:#fff,stroke:#00e5ff,stroke-width:2px

Strategies¶

Context Fork¶

Use context: fork on heavy skills to isolate them into subagents, preventing context window bloat.

Wave Dispatch¶

Each claude -p agent has an isolated context — no bias accumulation from previous test results.

File Piping¶

Pipe results to files (> logs/output.txt), never inline. Prevents large tool outputs from consuming context.

Output Limits¶

--max-time on every curl call
head -N on large outputs
Skip inapplicable tests via /route test-plan.json

Model Routing¶

Per-scope model assignment ensures expensive Opus tokens are only used where creative reasoning adds value. Passive checks use cheaper Sonnet/Haiku.

1M Context Window Optimizations¶

With Opus 4.6 and Sonnet 4.6 upgraded to 1M context (5x from 200K), the suite leverages deeper reasoning per agent:

Parameter	Previous (200K)	Current (1M)	Rationale
Thinking budget (HIGH)	10-16K	16-24K	Lab data: 12K → 82% SQLi detection, 20K projected ~87-88%
Thinking budget (MED)	5-8K	8-14K	More reasoning for structured tests
Thinking budget (Sonnet)	3-5K	6-10K	Sonnet now has 1M context, can benefit from deeper analysis
Max-turns per agent	150	250	Agents explore more deeply before context exhaustion
Discover subprocess turns	200	300	Discovery phase benefits most from extended exploration
Scan subprocess turns	100	150	More room for complex scan output parsing
L3 endpoint split threshold	12	20	Single agent with 1M can maintain correlation across more endpoints
L3 split bands	<=24→2, >24→3	<=40→2, >40→3	Fewer splits = less init overhead, better cross-endpoint reasoning
Body cap (fingerprinting)	50KB	100KB	SPAs with inline JS often exceed 50KB
Tier 1 model	Haiku	Sonnet	Aligned with model map; Sonnet at 1M handles crypto/supply-chain better
Cross-wave propagation	Basic summary	Evidence hints + params + full endpoint context	Richer inter-wave intelligence with 1M headroom

What didn't change: wave architecture (3 agents/wave, stealth-driven), request budgets (500/skill, target protection), JITTER_MULT, kill switch timeouts, file piping pattern, Haiku thinking (2000 — deterministic tasks).

Impact Summary¶

Strategy	Token Savings
Context fork	Prevents 50K+ token accumulation per heavy skill
File piping	Avoids 10-50K tokens per large response
Model routing	40-60% cost reduction on passive/deterministic tasks
/route skip	Eliminates entire test categories per endpoint