Skip to content

Watermarking (Levels B + E)

Multi-layer steganographic watermarking system that embeds invisible fingerprints in knowledge packs. Each layer uses a different encoding technique, making removal extremely difficult.

Not applied to client reports

B1 zero-width watermarks are not injected into pentest reports. Reports flow through the BeDefended report engine to generate the final DOCX deliverable for clients -- any invisible characters would propagate into client-facing documents. B1 is available as a library (watermark_service.py) for internal use only (e.g., watermarking internal documents or proposals).

Level B1: Zero-Width Characters (library only)

Invisible Unicode characters encode company_id:engagement_id:timestamp as a 96-bit invisible sequence. Available in watermark_service.py for internal documents -- not used in pentest report pipeline.

Characters Used

Character Unicode Represents
Zero Width Space U+200B Bit 0
Zero Width Non-Joiner U+200C Bit 1
Zero Width Joiner U+200D Field separator
Zero Width No-Break Space U+FEFF Start/end marker

Encoding

  1. Pack: company_id (32-bit) + engagement_id (32-bit) + timestamp (32-bit) = 96 bits
  2. Convert each bit: 0 -> U+200B, 1 -> U+200C
  3. Wrap with U+FEFF markers, separate fields with U+200D

Level B2: Homoglyph Substitution

Visually-identical Unicode characters replace ASCII characters based on HMAC(installation_id, filename:line).

Homoglyph Mapping

ASCII Unicode Replacement Script
a (U+0061) a (U+0430) Cyrillic
c (U+0063) c (U+0441) Cyrillic
e (U+0065) e (U+0435) Cyrillic
o (U+006F) o (U+043E) Cyrillic
p (U+0070) p (U+0440) Cyrillic
x (U+0078) x (U+0445) Cyrillic
y (U+0079) y (U+0443) Cyrillic

Capacity

  • ~500 target words per file x 32 files = ~16,000 bits of fingerprint
  • Deterministic: same installation always produces same pattern

Level B3: Trailing Whitespace in Findings

Pattern of 0, 1, or 2 trailing spaces on the first 20 lines of each FINDING-*.md.

  • 3^20 = ~3.5 billion unique combinations
  • Invisible in all editors and renderers

Level E1: Payload Ordering

Numbered lists of 5+ items are permuted deterministically per installation.

  • Seed: HMAC-SHA256(installation_id, filename:list:index)
  • 10! = 3,628,800 permutations per list
  • 5+ lists per file = enormous combinatorial space

Level E2: Synonym Substitution

50 technical synonym pairs are chosen deterministically per installation.

  • Each pair: HMAC(installation_id, pair_index) bit 0 selects variant
  • 2^50 > 10^15 unique combinations
  • Example: "endpoint" vs "URL path", "vulnerability" vs "security flaw"

Forensic Decoding

# Decode zero-width watermark from leaked report
python scripts/decode-watermark.py --file leaked-report.md

# Verify homoglyph pattern against known installation
python scripts/decode-watermark.py --file leaked-knowledge.md --known-id a1b2c3d4

# Decode from pasted text
echo "suspicious text" | python scripts/decode-watermark.py

Files

File Purpose
dashboard/backend/app/services/watermark_service.py Zero-width encode/decode (B1)
scripts/watermark-knowledge.py Homoglyphs (B2) + ordering (E1) + synonyms (E2)
scripts/decode-watermark.py Forensic extraction tool