Skip to content

Model Routing Rationale

Overview

V3 Pragmatica uses a 3-tier model routing strategy where vulnerability impact potential drives model selection. Every sub-agent dispatched by the wave coordinator is assigned a model and thinking budget based on the maximum severity of findings that agent could produce.

The Core Principle

Not all testing tasks require the same reasoning depth. SQL injection exploitation demands creative bypass construction and response analysis, while TLS certificate validation is a deterministic checklist. Assigning Opus to both wastes budget on the latter and risks insufficient reasoning on the former.

The routing key is a compound lookup: skill-scope (e.g., test-injection-sqli), falling back to skill if no scope is active, and finally defaulting to Sonnet.

Three-Tier Model Assignment

Tier 1: Opus (Creative/Critical)

Opus handles tasks where the ceiling for finding severity is Critical or High, and where success depends on creative reasoning, response interpretation, or multi-step attack chain construction.

Agent Justification
test-sqli Blind SQLi requires observing subtle timing/content differences across dozens of responses
test-xss WAF bypass demands iterative payload mutation and context-aware encoding
test-cmdi 8 blind timing variants need systematic evaluation of response deltas
test-oauth State null-byte, IDN homograph, pre-ATO require multi-step protocol reasoning
test-logic-business Financial bypass, negative amounts, overdraft exploitation are inherently creative
test-logic-race Race condition window detection requires precise timing analysis
test-infra-smuggling CL.TE/TE.CL desync requires byte-level reasoning about parser differences
test-dom DOM XSS source-sink tracing through complex JS call chains
test-access-idor Neighbor-ID, role escalation, and field injection patterns need contextual reasoning
route Endpoint-to-test mapping requires understanding the full attack surface
verify False positive elimination demands careful response comparison against baselines
chain Correlating findings into attack chains (SSRF to cloud creds, XSS to ATO)

Tier 2: Opus with Medium Effort

A subset of Opus agents handle tasks that are high-impact but more procedural. These get Opus for quality but lower thinking budgets to control cost.

Agent Justification
test-ssrf-vector Bypass patterns are catalogued; needs Opus for response analysis, not creative generation
test-csrf-cors SameSite analysis and Content-Type downgrade are structured checks
test-api-rest REST endpoint testing follows systematic patterns but needs reasoning for auth bypass
test-api-graphql Introspection, batching, WS auth bypass are well-defined attack trees
test-deser Gadget chain detection is pattern-matching with Opus-level response analysis
test-advanced-* HPP, CRLF, MFA bypass, host header are structured but need quality FP filtering
test-cloud-* S3/GCS misconfig, subdomain takeover follow known patterns
test-supply-chain Dependency analysis, SRI checks are systematic
test-infra-cache Cache poisoning/deception follows documented techniques

Tier 3: Sonnet (Passive/Deterministic)

Sonnet handles tasks with lower severity ceilings or highly deterministic execution paths.

Agent Justification
test-crypto TLS/SSL checks are tool-driven (testssl.sh output parsing)
test-exceptions Stack trace and debug mode detection is pattern matching
test-llm Prompt injection testing follows structured probe categories
test-mobile Binary analysis with tooling (apktool, jadx) is procedural

Tier 4: Haiku (Tool Execution)

Haiku handles pure tool orchestration where no security reasoning is needed.

Agent Justification
recon Subfinder, httpx, dnsx are fire-and-parse
scan Nuclei, nikto execution and output collection

Thinking Budget Tiers

Thinking budgets control how many tokens the model spends on internal reasoning before producing output. Higher budgets improve quality on complex reasoning tasks but increase cost linearly.

Tier Budget Range Assignment
HIGH 10,000-16,000 tokens SQLi, XSS, CMDi, OAuth, business logic, race conditions, smuggling, DOM XSS, IDOR, verify, chain, route
MEDIUM 5,000-8,000 tokens SSRF vectors, CSRF/CORS, API testing, deserialization, advanced checks, cloud, supply chain, GraphQL
Sonnet 3,000-5,000 tokens Crypto, exceptions, LLM, mobile
Haiku default 2,000 tokens Recon, scan (implicit default)

Route gets the highest budget

The /route skill receives 16,000 thinking tokens because it must analyze the entire resource map, injectable parameters, and scan results to produce an accurate endpoint-to-test mapping. Mistakes here cascade into missed coverage downstream.

Cost-Quality Tradeoff

The 3-tier approach produces approximately 2-2.5x the token cost compared to running everything on Sonnet. The justification is measurable: lab evaluations show that Opus-routed injection and auth testing finds 30-40% more vulnerabilities than the same skills on Sonnet, particularly for blind and time-based attacks that require multi-step reasoning.

The budget is controlled by:

  • Per-agent request limits: 500 / N_CONCURRENT_AGENTS requests per agent
  • JITTER_MULT scaling: Combined request rate stays within stealth limits regardless of concurrency
  • Kill switches: 45-minute timeout (60 for injection), hard stop at 500 requests per skill

V3 Pragmatica vs Previous Versions

Version Strategy Agents Issue
V1 Single model (Opus for everything) 16 monolithic Excessive cost, context degradation in long-running agents
V2 2-tier (Opus/Sonnet) 16 monolithic Better cost, but monolithic agents still degraded after 100+ turns
V3 Pragmatica 3-tier with effort levels, scope decomposition 31+ sub-agents Focused agents with isolated context, model matched to task complexity

V3 Pragmatica's key insight is that smaller, focused agents outperform larger monolithic ones even when using the same model, because context window degradation is the primary quality bottleneck in long-running penetration tests. By decomposing skills into scopes and routing each scope to the appropriate model with a calibrated thinking budget, V3 achieves both higher finding rates and lower per-engagement cost.

Compound Lookup Keys

The dispatch function resolves models using compound keys for scope-level granularity:

# Build compound lookup key
local lookup_key="$skill"
[ -n "$scope" ] && lookup_key="${skill}-${scope}"

# Resolve model (default: Sonnet)
local model="${MODEL_MAP[$lookup_key]:-${MODEL_MAP[$skill]:-claude-sonnet-4-6}}"

# Resolve thinking budget (default: 2000)
local thinking="${THINKING_MAP[$lookup_key]:-${THINKING_MAP[$skill]:-2000}}"

This allows different scopes of the same skill to receive different models. For example, test-injection-sqli gets Opus HIGH while test-injection-misc gets Opus MEDIUM.