Learning Loop¶
Extracts successful techniques, payloads, and bypasses from completed engagements and stores them in a learning index. On future engagements, the index recommends payloads that have historically worked on similar tech stacks.
How it works¶
- After an engagement completes, the extraction endpoint parses all
FINDING-*.mdfiles - For each verified finding, it extracts:
- The CWE
- The successful payload or technique
- The bypass type (e.g., WAF bypass, filter evasion)
- The target's tech stack
- Each technique is stored with a
success_countthat increments if the same payload succeeds again on a different engagement - The recommendation engine queries the index and ranks techniques by success rate for the given tech stack and CWE
What gets extracted¶
| Field | Source | Example |
|---|---|---|
cwe |
Finding CWE tag | CWE-89 |
tech_stack |
context.json |
Spring Boot + PostgreSQL |
technique |
Finding description | Time-based blind SQLi via ORDER BY |
payload |
Finding PoC | ' OR SLEEP(5)-- - |
bypass_type |
Finding notes | WAF bypass via chunked encoding |
API endpoints¶
Extract techniques from engagement¶
Parses the engagement's findings and adds new techniques to the index. Returns the number of techniques extracted.
Response:
{
"engagement": "acme-2026-q1",
"techniques_extracted": 12,
"techniques": [
{
"id": 45,
"cwe": "CWE-89",
"tech_stack": "Spring Boot + PostgreSQL",
"technique": "Time-based blind SQLi via ORDER BY clause",
"payload": "1 AND (SELECT 1 FROM (SELECT SLEEP(5))a)-- -",
"bypass_type": "WAF chunked encoding",
"success_count": 3,
"last_used_at": "2026-03-15T16:00:00Z"
}
]
}
Get recommendations¶
| Parameter | Type | Default | Description |
|---|---|---|---|
tech_stack |
query | * |
Filter by tech stack |
cwes |
query | all | Comma-separated CWE IDs |
limit |
query | 20 |
Max recommendations (1-100) |
Response (LearningRecommendation[]):
[
{
"cwe": "CWE-89",
"tech_stack": "Spring Boot + PostgreSQL",
"recommended_payloads": [
"' OR SLEEP(5)-- -",
"1 UNION SELECT NULL,NULL,version()-- -"
],
"recommended_techniques": [
"Time-based blind via ORDER BY",
"UNION-based via column count enumeration"
],
"historical_success_rate": 0.73
}
]
Statistics¶
Aggregate statistics: total techniques, top CWEs, most successful payloads, coverage by tech stack.
List all techniques¶
Browse the full learning index with optional filters.
Trigger Hindsight reflection¶
Triggers a Hindsight reflection cycle — analyzes all stored memories and synthesizes mental models (patterns, correlations, actionable insights). Requires Hindsight to be enabled.
Response:
{
"status": "ok",
"engagement": "acme-2026-q1",
"tech_stack": "Spring Boot + PostgreSQL",
"reflection": "Pattern: Java web apps using Spring Boot + Hibernate are 73% likely to have HQL injection when user input reaches dynamic queries. Most successful bypass: parameterized LIKE with wildcard injection."
}
Check Hindsight health¶
Returns the status of the Hindsight agent memory service.
CLI equivalent¶
The /learn skill runs the same extraction from the command line. It's automatically called at the end of a /pentest run.
Use /learn <engagement_name> --reflect to also trigger a Hindsight reflection cycle after extraction.
Hindsight agent memory integration¶
The learning loop uses a dual-write architecture: SQL (primary) + Hindsight (semantic memory).
What Hindsight adds¶
| Capability | SQL (existing) | Hindsight (new) |
|---|---|---|
| Exact match by CWE/tech_stack | Yes | Yes |
| Semantic similarity search | No | Yes |
| Cross-engagement graph | No | Yes |
| Temporal memory | No | Yes |
| Reflection/mental models | No | Yes |
| Works when Hindsight is down | Yes | N/A (graceful fallback) |
How it works¶
- Retain (dual-write): Every extracted technique is written to SQL AND stored in Hindsight as an Experience Memory
- Recall (enrichment): Recommendations combine SQL exact-match with Hindsight semantic search — finds similar techniques even when tech stack names don't match exactly
- Reflect (synthesis): After extraction, optionally synthesize mental models — high-level patterns from accumulated experience
Micro-agent memory¶
Each wave of test agents receives Hindsight context before dispatch. The agent-dispatch protocol calls recall_techniques() with the current tech stack and skill scope, injecting relevant historical techniques into the agent's prompt. This gives agents "experience" without polluting their context window.
Current status: DISABLED by default¶
Hindsight is integrated but disabled (HINDSIGHT_ENABLED=false). The SQL learning loop works independently. Enable Hindsight when you have 30-50+ completed engagements to benefit from semantic recall and cross-engagement pattern synthesis.
How to enable Hindsight¶
Step 1: Get an Anthropic API key¶
Go to https://console.anthropic.com/settings/keys and create a new key. This key is only for Hindsight memory operations — it does NOT affect claude -p which continues to use your Max subscription.
Estimated cost: ~$0.17 per engagement with Sonnet 4.6 (~$1.70/month for 10 pentests).
Step 2: Configure environment¶
Add to dashboard/.env:
Step 3: Restart the dashboard¶
Hindsight starts automatically (API on :8888, UI on :9999).
Step 4: Ingest knowledge packs (one-time)¶
Populate Hindsight with the 60+ static knowledge packs:
Step 5: Verify¶
# Check Hindsight health
curl http://localhost:8880/api/v1/learning/hindsight/health
# Check Hindsight UI
open http://localhost:9999
How to disable again¶
Set HINDSIGHT_ENABLED=false in dashboard/.env and restart. All operations fall back to SQL-only — no errors, no data loss. Hindsight data persists in the hindsight-data Docker volume.
Configuration reference¶
| Variable | Default | Description |
|---|---|---|
HINDSIGHT_ENABLED |
false |
Enable/disable Hindsight integration |
HINDSIGHT_API_KEY |
(required when enabled) | Anthropic API key (Hindsight only, not for claude -p) |
HINDSIGHT_URL |
http://hindsight:8888 |
Hindsight API endpoint |
HINDSIGHT_BANK_ID |
bd-pentest |
Memory bank for pentest techniques |
HINDSIGHT_LLM_PROVIDER |
anthropic |
LLM provider |
HINDSIGHT_LLM_MODEL |
claude-sonnet-4-6 |
Model for reasoning/reflection |
Connections to other features¶
- Confidence Calibration: the learning loop checks calibration data when ranking techniques. Payloads that frequently produce false positives are ranked lower, even if they trigger a match
- Remediation Generator: knowing which technique exploited a vulnerability helps the Remediation Generator produce more targeted fix code
- Knowledge packs: the learning index complements the static knowledge packs in
.claude/skills/test-*/helpers/. Static packs contain curated techniques; the learning index adds engagement-proven ones - Hindsight memory: semantic recall enriches recommendations beyond exact SQL matches. Reflection synthesizes cross-engagement insights. Chain-findings uses graph recall for cross-engagement chain discovery
- Micro-agents: wave dispatch injects Hindsight context for agent "experience" without context pollution