Output Sanitization¶

The runner service and terminal PTY stream are the richest channels for IP leakage. They expose the raw execution output of the Claude CLI, which includes references to internal knowledge packs, AI reasoning markers, tool configurations, and skill file paths.

The output sanitizer strips these references before any data reaches the frontend.

Sanitized Channels¶

Channel	Endpoint	What It Carries
Runner output (HTTP)	`GET /runner/sessions/{id}/output`	Paginated CLI output lines
Runner stream (WebSocket)	`WS /runner/sessions/{id}/stream`	Real-time CLI output
Terminal PTY (WebSocket)	`WS /terminal/ws`	Live PTY character stream

All three channels pass through output_sanitizer.py before reaching the client.

Regex Patterns¶

The sanitizer applies 11 regex patterns in priority order. Each pattern replaces matched content with a generic placeholder.

Pattern 1: AI Decision Markers¶

Pattern:  ^\s*\[AI-DECISION\].*$
Replaces: [Analysis point]

Strips internal reasoning markers that reveal how the AI evaluates findings, selects attack paths, and eliminates false positives. These markers are placed at 2-3 critical decision points per skill.

Example:

Before	After
`[AI-DECISION] Confidence 72% — response differs by 3 bytes, likely SQLi`	`[Analysis point]`
`# [AI-DECISION] Switch to time-based blind injection`	`[Analysis point]`

Pattern 2: Knowledge Pack References¶

Pattern:  knowledge-[\w-]+\.md|helpers/knowledge-[\w-]+\.md
Replaces: [internal-ref]

Knowledge packs contain curated attack techniques, bypass payloads, and vulnerability patterns. Their filenames alone reveal the categorization system.

Example:

Before	After
`Reading knowledge-sqli.md for union-based techniques`	`Reading [internal-ref] for union-based techniques`
`Loaded helpers/knowledge-xss-bypass.md`	`Loaded [internal-ref]`

Pattern 3: SKILL.md File Paths¶

Pattern:  \.claude/skills/[\w-]+/(?:SKILL\.md|helpers/[\w.-]+)
Replaces: [internal-ref]

Skill file paths reveal the internal organization, naming conventions, and the existence of specific testing modules.

Pattern 4: Internal Boilerplate References¶

Pattern:  skill-boilerplate\.md|agent-dispatch\.md|stealth-config\.md
          |hacker-heuristics\.md|kill-signals\.md|research-escalation\.md
          |safety-validator\.md|codex-dispatch\.md|codex-schemas/
Replaces: [internal-ref]

These shared infrastructure files contain the core operational logic: how agents are dispatched, how stealth is maintained, how research is escalated mid-test.

Pattern 5: Nuclei Template Paths¶

Pattern:  (?:/opt/nuclei-templates|~/nuclei-templates|nuclei-templates)/[\w/-]+\.yaml
Replaces: [template]

Absolute template paths reveal the directory structure and which specific vulnerability checks are being run.

Pattern 6: Model Routing Configuration¶

Pattern:  Opus\s+(?:high|medium|low)\s+\d+K\s+thinking
          |Haiku\s+\d+K\s+thinking
          |thinking_budget[=:]\s*\d+
Replaces: [config]

Model routing details (which model handles which skill, thinking token budgets) are proprietary architectural decisions.

Pattern 7: Research Escalation Internals¶

Pattern:  \[RRE\]\s+(?:budget|escalation|trigger|gate|remaining)
Replaces: [research]

Runtime Research Escalation markers reveal the dynamic web search strategy, trigger thresholds, and budget management.

Pattern 8: Codex Integration Markers¶

Pattern:  \[CODEX\]\s+(?:dispatch|budget|P[0-7]|tiebreaker|wave.review|coverage.audit)
Replaces: [analysis]

Codex integration points reveal the dual-engine architecture and cross-model verification strategy.

Pattern 9: Stealth Internals¶

Pattern:  JITTER_MULT=\d+|stealth_curl\(|count_request\(|check_timeout\(
Replaces: [config]

Internal function calls and stealth parameters reveal the rate-limiting and evasion implementation.

Pattern 10: Kill Signal Logs¶

Pattern:  \[KILL-SIGNAL\]\s+reason=\S+\s+endpoint=\S+\s+action=\S+
Replaces: [signal]

Kill signals reveal the time-waste prevention logic: what triggers a test to abort and how the system reallocates resources.

Pattern 11: Wave Coordinator Internals¶

Pattern:  Wave\s+\d+/\d+\s*[:.].*?agents?\s*[:.]|agent-\w+\.json
Replaces: [phase]

Wave scheduling details reveal the parallel execution strategy and agent naming conventions.

Implementation¶

Source File¶

dashboard/backend/app/services/output_sanitizer.py

Functions¶

Function	Input	Output	Used By
`sanitize_output(text)`	Raw output string	Sanitized string	Terminal PTY, runner WS
`sanitize_runner_line(line)`	Output line dict (`{content, parsed}`)	Sanitized dict	Runner HTTP output
`sanitize_finding_markdown(md)`	Finding raw_markdown	Sanitized markdown	Finding detail endpoint
`sanitize_timeline_skill(name)`	Skill name (e.g. `test-injection`)	Generic label (e.g. `Security Testing`)	Timeline endpoint (viewer role)

Data Flow¶

graph LR
    Runner["Runner Service<br/>(raw output)"] --> Sanitizer["output_sanitizer.py<br/>11 regex patterns"]
    Terminal["Terminal PTY<br/>(raw stream)"] --> Sanitizer
    Sanitizer --> HTTP["HTTP Response<br/>(sanitized)"]
    Sanitizer --> WS["WebSocket<br/>(sanitized)"]
    HTTP --> Browser["Browser"]
    WS --> Browser

    style Sanitizer fill:#1b5e20,color:#fff
    style Runner fill:#b71c1c,color:#fff
    style Terminal fill:#b71c1c,color:#fff

Bypass Resistance¶

The sanitizer is designed to be resilient against common bypass techniques:

Case variation -- All patterns use re.IGNORECASE where applicable
Multiline content -- re.MULTILINE flag handles content split across lines
Partial matches -- Patterns match substrings, not just full lines
Nested references -- Multiple patterns can match the same line (applied sequentially)

Known Limitations¶

Semantic inference -- If the output describes a technique without using pattern-matched keywords, it passes through. This is by design: generic technique descriptions are not IP.
Timing analysis -- An observer can infer phase duration and request count from WebSocket message frequency. This is mitigated by WebSocket authentication (only authorized users see the stream).
Error messages -- Stack traces from internal tools may reference file paths. These are caught by patterns 3 and 4.