Output Sanitization¶
The runner service and terminal PTY stream are the richest channels for IP leakage. They expose the raw execution output of the Claude CLI, which includes references to internal knowledge packs, AI reasoning markers, tool configurations, and skill file paths.
The output sanitizer strips these references before any data reaches the frontend.
Sanitized Channels¶
| Channel | Endpoint | What It Carries |
|---|---|---|
| Runner output (HTTP) | GET /runner/sessions/{id}/output |
Paginated CLI output lines |
| Runner stream (WebSocket) | WS /runner/sessions/{id}/stream |
Real-time CLI output |
| Terminal PTY (WebSocket) | WS /terminal/ws |
Live PTY character stream |
All three channels pass through output_sanitizer.py before reaching the client.
Regex Patterns¶
The sanitizer applies 11 regex patterns in priority order. Each pattern replaces matched content with a generic placeholder.
Pattern 1: AI Decision Markers¶
Strips internal reasoning markers that reveal how the AI evaluates findings, selects attack paths, and eliminates false positives. These markers are placed at 2-3 critical decision points per skill.
Example:
| Before | After |
|---|---|
[AI-DECISION] Confidence 72% — response differs by 3 bytes, likely SQLi |
[Analysis point] |
# [AI-DECISION] Switch to time-based blind injection |
[Analysis point] |
Pattern 2: Knowledge Pack References¶
Knowledge packs contain curated attack techniques, bypass payloads, and vulnerability patterns. Their filenames alone reveal the categorization system.
Example:
| Before | After |
|---|---|
Reading knowledge-sqli.md for union-based techniques |
Reading [internal-ref] for union-based techniques |
Loaded helpers/knowledge-xss-bypass.md |
Loaded [internal-ref] |
Pattern 3: SKILL.md File Paths¶
Skill file paths reveal the internal organization, naming conventions, and the existence of specific testing modules.
Pattern 4: Internal Boilerplate References¶
Pattern: skill-boilerplate\.md|agent-dispatch\.md|stealth-config\.md
|hacker-heuristics\.md|kill-signals\.md|research-escalation\.md
|safety-validator\.md|codex-dispatch\.md|codex-schemas/
Replaces: [internal-ref]
These shared infrastructure files contain the core operational logic: how agents are dispatched, how stealth is maintained, how research is escalated mid-test.
Pattern 5: Nuclei Template Paths¶
Pattern: (?:/opt/nuclei-templates|~/nuclei-templates|nuclei-templates)/[\w/-]+\.yaml
Replaces: [template]
Absolute template paths reveal the directory structure and which specific vulnerability checks are being run.
Pattern 6: Model Routing Configuration¶
Pattern: Opus\s+(?:high|medium|low)\s+\d+K\s+thinking
|Haiku\s+\d+K\s+thinking
|thinking_budget[=:]\s*\d+
Replaces: [config]
Model routing details (which model handles which skill, thinking token budgets) are proprietary architectural decisions.
Pattern 7: Research Escalation Internals¶
Runtime Research Escalation markers reveal the dynamic web search strategy, trigger thresholds, and budget management.
Pattern 8: Codex Integration Markers¶
Pattern: \[CODEX\]\s+(?:dispatch|budget|P[0-7]|tiebreaker|wave.review|coverage.audit)
Replaces: [analysis]
Codex integration points reveal the dual-engine architecture and cross-model verification strategy.
Pattern 9: Stealth Internals¶
Internal function calls and stealth parameters reveal the rate-limiting and evasion implementation.
Pattern 10: Kill Signal Logs¶
Kill signals reveal the time-waste prevention logic: what triggers a test to abort and how the system reallocates resources.
Pattern 11: Wave Coordinator Internals¶
Wave scheduling details reveal the parallel execution strategy and agent naming conventions.
Implementation¶
Source File¶
dashboard/backend/app/services/output_sanitizer.py
Functions¶
| Function | Input | Output | Used By |
|---|---|---|---|
sanitize_output(text) |
Raw output string | Sanitized string | Terminal PTY, runner WS |
sanitize_runner_line(line) |
Output line dict ({content, parsed}) |
Sanitized dict | Runner HTTP output |
sanitize_finding_markdown(md) |
Finding raw_markdown | Sanitized markdown | Finding detail endpoint |
sanitize_timeline_skill(name) |
Skill name (e.g. test-injection) |
Generic label (e.g. Security Testing) |
Timeline endpoint (viewer role) |
Data Flow¶
graph LR
Runner["Runner Service<br/>(raw output)"] --> Sanitizer["output_sanitizer.py<br/>11 regex patterns"]
Terminal["Terminal PTY<br/>(raw stream)"] --> Sanitizer
Sanitizer --> HTTP["HTTP Response<br/>(sanitized)"]
Sanitizer --> WS["WebSocket<br/>(sanitized)"]
HTTP --> Browser["Browser"]
WS --> Browser
style Sanitizer fill:#1b5e20,color:#fff
style Runner fill:#b71c1c,color:#fff
style Terminal fill:#b71c1c,color:#fff
Bypass Resistance¶
The sanitizer is designed to be resilient against common bypass techniques:
- Case variation -- All patterns use
re.IGNORECASEwhere applicable - Multiline content --
re.MULTILINEflag handles content split across lines - Partial matches -- Patterns match substrings, not just full lines
- Nested references -- Multiple patterns can match the same line (applied sequentially)
Known Limitations¶
- Semantic inference -- If the output describes a technique without using pattern-matched keywords, it passes through. This is by design: generic technique descriptions are not IP.
- Timing analysis -- An observer can infer phase duration and request count from WebSocket message frequency. This is mitigated by WebSocket authentication (only authorized users see the stream).
- Error messages -- Stack traces from internal tools may reference file paths. These are caught by patterns 3 and 4.