Skip to content

Output Sanitization

The runner service and terminal PTY stream are the richest channels for IP leakage. They expose the raw execution output of the Claude CLI, which includes references to internal knowledge packs, AI reasoning markers, tool configurations, and skill file paths.

The output sanitizer strips these references before any data reaches the frontend.


Sanitized Channels

Channel Endpoint What It Carries
Runner output (HTTP) GET /runner/sessions/{id}/output Paginated CLI output lines
Runner stream (WebSocket) WS /runner/sessions/{id}/stream Real-time CLI output
Terminal PTY (WebSocket) WS /terminal/ws Live PTY character stream

All three channels pass through output_sanitizer.py before reaching the client.


Regex Patterns

The sanitizer applies 11 regex patterns in priority order. Each pattern replaces matched content with a generic placeholder.

Pattern 1: AI Decision Markers

Pattern:  ^\s*\[AI-DECISION\].*$
Replaces: [Analysis point]

Strips internal reasoning markers that reveal how the AI evaluates findings, selects attack paths, and eliminates false positives. These markers are placed at 2-3 critical decision points per skill.

Example:

Before After
[AI-DECISION] Confidence 72% — response differs by 3 bytes, likely SQLi [Analysis point]
# [AI-DECISION] Switch to time-based blind injection [Analysis point]

Pattern 2: Knowledge Pack References

Pattern:  knowledge-[\w-]+\.md|helpers/knowledge-[\w-]+\.md
Replaces: [internal-ref]

Knowledge packs contain curated attack techniques, bypass payloads, and vulnerability patterns. Their filenames alone reveal the categorization system.

Example:

Before After
Reading knowledge-sqli.md for union-based techniques Reading [internal-ref] for union-based techniques
Loaded helpers/knowledge-xss-bypass.md Loaded [internal-ref]

Pattern 3: SKILL.md File Paths

Pattern:  \.claude/skills/[\w-]+/(?:SKILL\.md|helpers/[\w.-]+)
Replaces: [internal-ref]

Skill file paths reveal the internal organization, naming conventions, and the existence of specific testing modules.


Pattern 4: Internal Boilerplate References

Pattern:  skill-boilerplate\.md|agent-dispatch\.md|stealth-config\.md
          |hacker-heuristics\.md|kill-signals\.md|research-escalation\.md
          |safety-validator\.md|codex-dispatch\.md|codex-schemas/
Replaces: [internal-ref]

These shared infrastructure files contain the core operational logic: how agents are dispatched, how stealth is maintained, how research is escalated mid-test.


Pattern 5: Nuclei Template Paths

Pattern:  (?:/opt/nuclei-templates|~/nuclei-templates|nuclei-templates)/[\w/-]+\.yaml
Replaces: [template]

Absolute template paths reveal the directory structure and which specific vulnerability checks are being run.


Pattern 6: Model Routing Configuration

Pattern:  Opus\s+(?:high|medium|low)\s+\d+K\s+thinking
          |Haiku\s+\d+K\s+thinking
          |thinking_budget[=:]\s*\d+
Replaces: [config]

Model routing details (which model handles which skill, thinking token budgets) are proprietary architectural decisions.


Pattern 7: Research Escalation Internals

Pattern:  \[RRE\]\s+(?:budget|escalation|trigger|gate|remaining)
Replaces: [research]

Runtime Research Escalation markers reveal the dynamic web search strategy, trigger thresholds, and budget management.


Pattern 8: Codex Integration Markers

Pattern:  \[CODEX\]\s+(?:dispatch|budget|P[0-7]|tiebreaker|wave.review|coverage.audit)
Replaces: [analysis]

Codex integration points reveal the dual-engine architecture and cross-model verification strategy.


Pattern 9: Stealth Internals

Pattern:  JITTER_MULT=\d+|stealth_curl\(|count_request\(|check_timeout\(
Replaces: [config]

Internal function calls and stealth parameters reveal the rate-limiting and evasion implementation.


Pattern 10: Kill Signal Logs

Pattern:  \[KILL-SIGNAL\]\s+reason=\S+\s+endpoint=\S+\s+action=\S+
Replaces: [signal]

Kill signals reveal the time-waste prevention logic: what triggers a test to abort and how the system reallocates resources.


Pattern 11: Wave Coordinator Internals

Pattern:  Wave\s+\d+/\d+\s*[:.].*?agents?\s*[:.]|agent-\w+\.json
Replaces: [phase]

Wave scheduling details reveal the parallel execution strategy and agent naming conventions.


Implementation

Source File

dashboard/backend/app/services/output_sanitizer.py

Functions

Function Input Output Used By
sanitize_output(text) Raw output string Sanitized string Terminal PTY, runner WS
sanitize_runner_line(line) Output line dict ({content, parsed}) Sanitized dict Runner HTTP output
sanitize_finding_markdown(md) Finding raw_markdown Sanitized markdown Finding detail endpoint
sanitize_timeline_skill(name) Skill name (e.g. test-injection) Generic label (e.g. Security Testing) Timeline endpoint (viewer role)

Data Flow

graph LR
    Runner["Runner Service<br/>(raw output)"] --> Sanitizer["output_sanitizer.py<br/>11 regex patterns"]
    Terminal["Terminal PTY<br/>(raw stream)"] --> Sanitizer
    Sanitizer --> HTTP["HTTP Response<br/>(sanitized)"]
    Sanitizer --> WS["WebSocket<br/>(sanitized)"]
    HTTP --> Browser["Browser"]
    WS --> Browser

    style Sanitizer fill:#1b5e20,color:#fff
    style Runner fill:#b71c1c,color:#fff
    style Terminal fill:#b71c1c,color:#fff

Bypass Resistance

The sanitizer is designed to be resilient against common bypass techniques:

  • Case variation -- All patterns use re.IGNORECASE where applicable
  • Multiline content -- re.MULTILINE flag handles content split across lines
  • Partial matches -- Patterns match substrings, not just full lines
  • Nested references -- Multiple patterns can match the same line (applied sequentially)

Known Limitations

  • Semantic inference -- If the output describes a technique without using pattern-matched keywords, it passes through. This is by design: generic technique descriptions are not IP.
  • Timing analysis -- An observer can infer phase duration and request count from WebSocket message frequency. This is mitigated by WebSocket authentication (only authorized users see the stream).
  • Error messages -- Stack traces from internal tools may reference file paths. These are caught by patterns 3 and 4.