Skip to content

Phase 4: Manual Testing (Wave Coordinator)

Overview

Phase 4 executes AI-driven penetration testing using 16 specialized test skills deployed in parallel waves. The Wave Coordinator manages scheduling, parallelism, health checks, and result aggregation.

Purpose: Systematically test every endpoint for the full spectrum of web application vulnerabilities using AI-powered reasoning.

CRITICAL: ALL 17 Skills are MANDATORY

Phase 4 NEVER skips any skill, regardless of the application type. Each skill covers vulnerabilities that might not be obvious from the app's apparent purpose:

  • Even "simple" apps without user authentication need JWT testing (internal APIs, third-party integrations)
  • Apps without file uploads still need deserialization testing (API payloads)
  • Financial apps need infrastructure testing (WAF bypass, request smuggling)

The 17 Test Skills

Injection Vulnerabilities (5 scopes: sqli, xss, cmdi, ssti-xxe, misc)

/test-injection — Tests for SQL injection, NoSQL injection, OS command injection, template injection, and more

Scopes: - sqli: SQL injection (blind, error-based, union-based, time-based) - xss: Cross-site scripting (reflected, stored, DOM) - cmdi: Command injection (OS commands, alternative syntaxes) - ssti-xxe: Server-side template injection, XXE, ESI injection - misc: LDAP injection, Mermaid injection, ExifTool RCE, SOQL injection

Authentication Attacks (3 scopes: jwt, oauth, session)

/test-auth — Tests authentication mechanisms and bypass techniques

Scopes: - jwt: JWT attacks (weak secret, algorithm confusion, claim manipulation, key cracking) - oauth: OAuth flow attacks (state bypass, PKCE bypass, pre-ATO, ROPC abuse, CSRF) - session: Session fixation, session hijacking, cookie vulnerabilities, SAML attacks

Access Control (3 scopes: idor, authz, matrix)

/test-access — Tests authorization and privilege escalation

Scopes: - idor: Insecure direct object reference (ID enumeration, neighboring IDs) - authz: Function-level authorization (can low-privilege user access high-privilege functions?) - matrix: Multi-user access matrix (comprehensive role-based testing)

SSRF (2 scopes: core, vector)

/test-ssrf — Tests server-side request forgery attacks

Scopes: - core: SSRF primitives (internal ports, redirect chains, protocol smuggling) - vector: SSRF vectors (PDF filters, Grafana chains, rogue MySQL)

Client-Side Security (3 scopes: csrf-cors, dom, misc)

/test-client — Tests client-side vulnerabilities

Scopes: - csrf-cors: CSRF attacks, CORS misconfigurations, SameSite bypass - dom: DOM XSS, DOM clobbering, prototype pollution - misc: Clickjacking, clipboard XSS, service worker theft, CSP bypass

Business Logic (3 scopes: business, race, upload)

/test-logic — Tests flawed business logic and workflows

Scopes: - business: Price/quantity manipulation, workflow bypass, coupon abuse, financial logic - race: Race conditions, atomicity violations, concurrency issues - upload: File upload bypass (extension, magic bytes, path traversal)

API Security (3 scopes: rest, graphql, prototype)

/test-api — Tests API-specific vulnerabilities

Scopes: - rest: REST API security (method override, content-type switching, parameter pollution) - graphql: GraphQL attacks (introspection, query complexity, aliases, directives) - prototype: Prototype pollution, recursive merge exploitation, constructor abuse

Infrastructure (2 scopes: smuggling, cache)

/test-infra — Tests infrastructure-level vulnerabilities

Scopes: - smuggling: HTTP request smuggling (CL.TE, TE.CL, HTTP/2 desync) - cache: Cache poisoning, cache deception, host header attacks

Cloud Security (3 scopes: storage, takeover, k8s-cicd)

/test-cloud — Tests cloud-specific vulnerabilities

Scopes: - storage: S3/GCS/Azure bucket misconfigurations, exposed consoles - takeover: Subdomain takeover, FQDN dot bypass, dangling DNS - k8s-cicd: Kubernetes exposure, Firebase rules, CI/CD escape, cloud metadata

Cryptography (no scopes)

/test-crypto — Tests SSL/TLS and cryptographic failures

Coverage: TLS version analysis, cipher suite weaknesses, certificate validation

Deserialization (no scopes)

/test-deser — Tests insecure deserialization across languages

Coverage: Java, PHP, .NET, Python, Ruby gadget chain exploitation

Advanced Techniques (4 scopes: hpp-crlf, bypass, mfa, host-method)

/test-advanced — Tests advanced/subtle vulnerabilities

Scopes: - hpp-crlf: HTTP parameter pollution, CRLF injection, param truncation - bypass: Open redirect, type juggling, Unicode bypass, parser differences - mfa: MFA bypass (8+ techniques), downgrade attacks, TOTP recovery - host-method: HTTP/2 desync, second-order attacks, Host header injection, method override

Supply Chain (no scopes)

/test-supply-chain — Tests dependency vulnerabilities

Coverage: Dependency confusion, SRI bypass, Docker layer secrets, manifest injection

Exceptions & Debug (no scopes)

/test-exceptions — Tests error handling and debug mode

Coverage: Stack trace leakage, debug mode detection, sensitive data in errors

LLM Attacks (no scopes, requires --llm flag)

/test-llm — Tests prompt injection and MCP attacks

Coverage: Direct/indirect prompt injection, jailbreaking, model confusion

Mobile Security (no scopes, requires --mobile flag)

/test-mobile — Tests Android and iOS applications

Coverage: APK/IPA analysis, API security, data storage, authentication

Wave Coordinator: 12-Wave Schedule

The 31 sub-agents (from 17 skills with scopes) are dispatched in 12 sequential waves, with 3 agents per wave running in parallel:

Wave 0: Critical Path (High-Impact Vulns)

┌─────────────────────────────────────┐
│ Wave 0: Critical Injection & Access │
├─────────────────────────────────────┤
│ [Agent-1] test-injection:sqli       │
│ [Agent-2] test-injection:xss        │
│ [Agent-3] test-access:idor          │
└─────────────────────────────────────┘
Targets the most impactful vulnerabilities first (RCE, data exfil, auth bypass).

Wave 1: Critical Continued

[Agent-1] test-injection:cmdi
[Agent-2] test-access:authz
[Agent-3] test-auth:jwt

Wave 2: Authentication

[Agent-1] test-auth:oauth
[Agent-2] test-auth:session
[Agent-3] test-client:csrf-cors

Wave 3-11: Remaining Skills (20 agents)

Each wave tests 3 scoped skills in parallel, covering APIs, logic, infrastructure, cloud, advanced, crypto, etc.

Wave 12: Flag-Gated (Conditional)

[Agent-1] test-llm (if --llm flag)
[Agent-2] test-mobile (if --mobile flag)

Full 12-wave schedule: See documentation at /architecture/wave-coordinator.md

Per-Endpoint Mandatory Checks

EVERY endpoint tested must pass through 8 mandatory checks:

Check # Test Rationale
1 No auth → expect 401/403 Detects missing authentication enforcement
2 Wrong role (lowest privilege) Tests role-based access control
3 Neighbor ID (±1 from valid ID) Detects IDOR
4 Extra fields in body (role_id, is_admin) Tests mass assignment, privilege escalation
5 Negative numeric values Tests financial logic, input validation
6 Hidden flags (?debug=1, ?admin=1) Detects development/debug bypasses
7 Content-Type: application/xml on JSON endpoints XXE on unexpected endpoints
8 Content-Type: multipart/form-data Validation bypass through content-type mismatch

Token-Driven Parameter Selection

During Phase 0, context.json was populated with detected technologies. Phase 4 agents use this to: - Select relevant payloads (e.g., skip .NET-specific bypasses if app is Node.js) - Adjust injection syntax (e.g., SLEEP() for MySQL vs BENCHMARK() for MariaDB) - Choose authentication mechanisms (JWT if detected, session if cookie-based)

Example: context.json detected Spring Boot + PostgreSQL → injection agents use PostgreSQL payloads, skip MSSQL/MySQL variants.

Wave Health Checks

Between each wave, the coordinator performs health checks:

# After Wave 0 completes
if [ $FINDINGS_COUNT -eq 0 ]; then
    echo "[WARNING] No findings in Wave 0 (critical injection)"
    echo "  Possible causes:"
    echo "    - WAF is blocking all payloads"
    echo "    - Target is not vulnerable (rare)"
    echo "    - Authentication token expired"
fi

Warnings logged but testing continues (no auto-abort).

Anti-First-Positive Rule

Finding a vulnerability in one endpoint doesn't complete testing for that vulnerability type:

❌ WRONG: Found SQLi on /api/v1/users?sort=  → Stop testing SQLi
✅ RIGHT: Found SQLi on /api/v1/users?sort=  → Test ALL endpoints with sort/filter/order parameters

Each confirmed vulnerability becomes a pattern to test across similar endpoints.

Dependency: Source Code Extraction

If Phase 3.5c confirmed LFI/RCE, source code is already extracted by this point. Phase 4 agents read source: - Exact field names (avoids guessing user_id vs userId vs uid) - Validation logic (understands input constraints) - Hidden endpoints (endpoints not in test-plan.json)

Flag in context.json: "source_code_available": true

Output During Phase 4

As each wave completes: - findings/FINDING-NNN.md files created for confirmed vulnerabilities - logs/agent-wave-N.log records test execution for each agent - logs/pentest-timeline.jsonl tracks timestamp of each finding - waves/agent-N.json stores agent state and results

Decision Point: Abort vs Continue

After each wave, decide: 1. Continue normally: Testing proceeding as expected 2. Skip remaining waves: WAF completely blocking all tests, or user requests stop 3. Investigate issue: Tool failure, authentication failure (re-authenticate and resume)

Next Phase

After all 12 waves complete, proceed to Phase 5: Verification to confirm every finding with a working exploit.