Watermarking (Levels B + E)¶

Multi-layer steganographic watermarking system that embeds invisible fingerprints in knowledge packs. Each layer uses a different encoding technique, making removal extremely difficult.

Not applied to client reports

B1 zero-width watermarks are not injected into pentest reports. Reports flow through the BeDefended report engine to generate the final DOCX deliverable for clients -- any invisible characters would propagate into client-facing documents. B1 is available as a library (watermark_service.py) for internal use only (e.g., watermarking internal documents or proposals).

Level B1: Zero-Width Characters (library only)¶

Invisible Unicode characters encode company_id:engagement_id:timestamp as a 96-bit invisible sequence. Available in watermark_service.py for internal documents -- not used in pentest report pipeline.

Characters Used¶

Character	Unicode	Represents
Zero Width Space	U+200B	Bit 0
Zero Width Non-Joiner	U+200C	Bit 1
Zero Width Joiner	U+200D	Field separator
Zero Width No-Break Space	U+FEFF	Start/end marker

Encoding¶

Pack: company_id (32-bit) + engagement_id (32-bit) + timestamp (32-bit) = 96 bits
Convert each bit: 0 -> U+200B, 1 -> U+200C
Wrap with U+FEFF markers, separate fields with U+200D

Level B2: Homoglyph Substitution¶

Visually-identical Unicode characters replace ASCII characters based on HMAC(installation_id, filename:line).

Homoglyph Mapping¶

ASCII	Unicode Replacement	Script
a (U+0061)	a (U+0430)	Cyrillic
c (U+0063)	c (U+0441)	Cyrillic
e (U+0065)	e (U+0435)	Cyrillic
o (U+006F)	o (U+043E)	Cyrillic
p (U+0070)	p (U+0440)	Cyrillic
x (U+0078)	x (U+0445)	Cyrillic
y (U+0079)	y (U+0443)	Cyrillic

Capacity¶

~500 target words per file x 32 files = ~16,000 bits of fingerprint
Deterministic: same installation always produces same pattern

Level B3: Trailing Whitespace in Findings¶

Pattern of 0, 1, or 2 trailing spaces on the first 20 lines of each FINDING-*.md.

3^20 = ~3.5 billion unique combinations
Invisible in all editors and renderers

Level E1: Payload Ordering¶

Numbered lists of 5+ items are permuted deterministically per installation.

Seed: HMAC-SHA256(installation_id, filename:list:index)
10! = 3,628,800 permutations per list
5+ lists per file = enormous combinatorial space

Level E2: Synonym Substitution¶

50 technical synonym pairs are chosen deterministically per installation.

Each pair: HMAC(installation_id, pair_index) bit 0 selects variant
2^50 > 10^15 unique combinations
Example: "endpoint" vs "URL path", "vulnerability" vs "security flaw"

Forensic Decoding¶

# Decode zero-width watermark from leaked report
python scripts/decode-watermark.py --file leaked-report.md

# Verify homoglyph pattern against known installation
python scripts/decode-watermark.py --file leaked-knowledge.md --known-id a1b2c3d4

# Decode from pasted text
echo "suspicious text" | python scripts/decode-watermark.py

Files¶

File	Purpose
`dashboard/backend/app/services/watermark_service.py`	Zero-width encode/decode (B1)
`scripts/watermark-knowledge.py`	Homoglyphs (B2) + ordering (E1) + synonyms (E2)
`scripts/decode-watermark.py`	Forensic extraction tool