Model Routing Rationale¶

Overview¶

V3 Pragmatica uses a 3-tier model routing strategy where vulnerability impact potential drives model selection. Every sub-agent dispatched by the wave coordinator is assigned a model and thinking budget based on the maximum severity of findings that agent could produce.

The Core Principle¶

Not all testing tasks require the same reasoning depth. SQL injection exploitation demands creative bypass construction and response analysis, while TLS certificate validation is a deterministic checklist. Assigning Opus to both wastes budget on the latter and risks insufficient reasoning on the former.

The routing key is a compound lookup: skill-scope (e.g., test-injection-sqli), falling back to skill if no scope is active, and finally defaulting to Sonnet.

Three-Tier Model Assignment¶

Tier 1: Opus (Creative/Critical)¶

Opus handles tasks where the ceiling for finding severity is Critical or High, and where success depends on creative reasoning, response interpretation, or multi-step attack chain construction.

Agent	Justification
`test-sqli`	Blind SQLi requires observing subtle timing/content differences across dozens of responses
`test-xss`	WAF bypass demands iterative payload mutation and context-aware encoding
`test-cmdi`	8 blind timing variants need systematic evaluation of response deltas
`test-oauth`	State null-byte, IDN homograph, pre-ATO require multi-step protocol reasoning
`test-logic-business`	Financial bypass, negative amounts, overdraft exploitation are inherently creative
`test-logic-race`	Race condition window detection requires precise timing analysis
`test-infra-smuggling`	CL.TE/TE.CL desync requires byte-level reasoning about parser differences
`test-dom`	DOM XSS source-sink tracing through complex JS call chains
`test-access-idor`	Neighbor-ID, role escalation, and field injection patterns need contextual reasoning
`route`	Endpoint-to-test mapping requires understanding the full attack surface
`verify`	False positive elimination demands careful response comparison against baselines
`chain`	Correlating findings into attack chains (SSRF to cloud creds, XSS to ATO)

Tier 2: Opus with Medium Effort¶

A subset of Opus agents handle tasks that are high-impact but more procedural. These get Opus for quality but lower thinking budgets to control cost.

Agent	Justification
`test-ssrf-vector`	Bypass patterns are catalogued; needs Opus for response analysis, not creative generation
`test-csrf-cors`	SameSite analysis and Content-Type downgrade are structured checks
`test-api-rest`	REST endpoint testing follows systematic patterns but needs reasoning for auth bypass
`test-api-graphql`	Introspection, batching, WS auth bypass are well-defined attack trees
`test-deser`	Gadget chain detection is pattern-matching with Opus-level response analysis
`test-advanced-*`	HPP, CRLF, MFA bypass, host header are structured but need quality FP filtering
`test-cloud-*`	S3/GCS misconfig, subdomain takeover follow known patterns
`test-supply-chain`	Dependency analysis, SRI checks are systematic
`test-infra-cache`	Cache poisoning/deception follows documented techniques

Tier 3: Sonnet (Passive/Deterministic)¶

Sonnet handles tasks with lower severity ceilings or highly deterministic execution paths.

Agent	Justification
`test-crypto`	TLS/SSL checks are tool-driven (testssl.sh output parsing)
`test-exceptions`	Stack trace and debug mode detection is pattern matching
`test-llm`	Prompt injection testing follows structured probe categories
`test-mobile`	Binary analysis with tooling (apktool, jadx) is procedural

Tier 4: Haiku (Tool Execution)¶

Haiku handles pure tool orchestration where no security reasoning is needed.

Agent	Justification
`recon`	Subfinder, httpx, dnsx are fire-and-parse
`scan`	Nuclei, nikto execution and output collection

Thinking Budget Tiers¶

Thinking budgets control how many tokens the model spends on internal reasoning before producing output. Higher budgets improve quality on complex reasoning tasks but increase cost linearly.

Tier	Budget Range	Assignment
HIGH	10,000-16,000 tokens	SQLi, XSS, CMDi, OAuth, business logic, race conditions, smuggling, DOM XSS, IDOR, verify, chain, route
MEDIUM	5,000-8,000 tokens	SSRF vectors, CSRF/CORS, API testing, deserialization, advanced checks, cloud, supply chain, GraphQL
Sonnet	3,000-5,000 tokens	Crypto, exceptions, LLM, mobile
Haiku default	2,000 tokens	Recon, scan (implicit default)

Route gets the highest budget

The /route skill receives 16,000 thinking tokens because it must analyze the entire resource map, injectable parameters, and scan results to produce an accurate endpoint-to-test mapping. Mistakes here cascade into missed coverage downstream.

Cost-Quality Tradeoff¶

The 3-tier approach produces approximately 2-2.5x the token cost compared to running everything on Sonnet. The justification is measurable: lab evaluations show that Opus-routed injection and auth testing finds 30-40% more vulnerabilities than the same skills on Sonnet, particularly for blind and time-based attacks that require multi-step reasoning.

The budget is controlled by:

Per-agent request limits: 500 / N_CONCURRENT_AGENTS requests per agent
JITTER_MULT scaling: Combined request rate stays within stealth limits regardless of concurrency
Kill switches: 45-minute timeout (60 for injection), hard stop at 500 requests per skill

V3 Pragmatica vs Previous Versions¶

Version	Strategy	Agents	Issue
V1	Single model (Opus for everything)	16 monolithic	Excessive cost, context degradation in long-running agents
V2	2-tier (Opus/Sonnet)	16 monolithic	Better cost, but monolithic agents still degraded after 100+ turns
V3 Pragmatica	3-tier with effort levels, scope decomposition	31+ sub-agents	Focused agents with isolated context, model matched to task complexity

V3 Pragmatica's key insight is that smaller, focused agents outperform larger monolithic ones even when using the same model, because context window degradation is the primary quality bottleneck in long-running penetration tests. By decomposing skills into scopes and routing each scope to the appropriate model with a calibrated thinking budget, V3 achieves both higher finding rates and lower per-engagement cost.

Compound Lookup Keys¶

The dispatch function resolves models using compound keys for scope-level granularity:

# Build compound lookup key
local lookup_key="$skill"
[ -n "$scope" ] && lookup_key="${skill}-${scope}"

# Resolve model (default: Sonnet)
local model="${MODEL_MAP[$lookup_key]:-${MODEL_MAP[$skill]:-claude-sonnet-4-6}}"

# Resolve thinking budget (default: 2000)
local thinking="${THINKING_MAP[$lookup_key]:-${THINKING_MAP[$skill]:-2000}}"

This allows different scopes of the same skill to receive different models. For example, test-injection-sqli gets Opus HIGH while test-injection-misc gets Opus MEDIUM.