Lab Catalog¶
10 registered lab targets covering different tech stacks, vulnerability profiles, and scoring modes.
Summary¶
| Lab | Tech Stack | Vulns | Auth | Scoring | Difficulty |
|---|---|---|---|---|---|
| VulnHR | Laravel/PHP, MySQL, LDAP | 81 | Sanctum + LDAP + Form | Answer key | ~30 easy / ~30 medium / ~20 hard |
| Juice Shop | Express.js, Angular, SQLite | 55 | JWT REST API | Answer key | Mixed (6 difficulty tiers) |
| SuperSecureBank | .NET 8, SQL Server | 37 | MVC Form Login | Answer key | Mixed |
| AltoroMutual | Spring Boot, React, PostgreSQL | 29 | JWT API | Answer key | Mixed |
| DVWA | PHP, MariaDB | 28 | PHP Session Form | Answer key | 4 security levels |
| DVRA | FastAPI, MongoDB | 12 | OAuth2 Token | Answer key | Mixed |
| XBOW | Mixed (104 Docker containers) | 104 | None | Flag-based (FLAG{...}) |
L1: 45, L2: 51, L3: 8 |
| HackBench | Mixed (16 real-world CVE stacks) | 16 | Per-challenge | Flag-based (ev{...}) + XSS alert |
9 easy, 2 medium, 5 hard |
| Gandalf (Lakera) | LLM Prompt Injection | 8 | None | Level completion | Progressive (8 levels) |
| HackMerlin | LLM Prompt Injection | 7 | None | Level completion | Progressive (7 levels) |
| PortSwigger Academy | Web Security (all categories) | 250+ | None | Lab completion | Apprentice / Practitioner / Expert |
| VibeApps (Neo) | 3 AI-generated apps (see below) | 74 | JWT / NextAuth / Custom Session | Answer key + Neo baseline | Mixed (8C, 13H, 16M, 25L, 12I) |
Total: 676+ distinct vulnerabilities/challenges across 14 targets.
VulnHR¶
The largest and most comprehensive lab target. HR Portal for a fictional company (Meridian Solutions). Covers 12 OWASP categories plus business logic and extra vulnerability classes.
| Property | Value |
|---|---|
| Target | http://vulnhr.test:7331/ |
| Tech Stack | Laravel/PHP, MySQL, Redis, Nginx, OpenLDAP |
| Vulnerabilities | 81 |
| Roles | 6 (admin, hr_manager, hr_specialist, manager, employee, observer) |
| Auth Methods | REST API (Sanctum), LDAP form, Web form |
| Containers | hrportal-nginx, hrportal-php, hrportal-postgres, hrportal-redis, hrportal-ldap |
| Pentest Flags | --fast |
OWASP Category Breakdown¶
| Category | Count |
|---|---|
| A01 - Broken Access Control | 12 |
| A02 - Cryptographic Failures | 3 |
| A03 - Injection | 13 |
| A04 - Insecure Design | 5 |
| A05 - Security Misconfiguration | 9 |
| A06 - Vulnerable Components | 2 |
| A07 - Auth Failures | 5 |
| A08 - Data Integrity | 3 |
| A09 - Logging Failures | 2 |
| A10 - SSRF | 2 |
| X - Extra | 13 |
| BL - Business Logic | 12 |
Difficulty Distribution¶
- Easy (~30 vulns): DAST-detectable, standard scanner coverage
- Medium (~30 vulns): Needs specific configuration or partial manual analysis
- Hard (~20 vulns): Requires deep manual analysis or exploit chains
Juice Shop¶
OWASP's flagship vulnerable web application. Express.js + Angular SPA with 111 built-in challenges; 55 are testable via automated pentest (excludes coding challenges, tutorial-only, and UI-puzzle challenges).
| Property | Value |
|---|---|
| Target | http://juiceshop.test:3000/ |
| Tech Stack | Express.js, Angular, SQLite |
| Vulnerabilities | 55 (of 111 total challenges) |
| Roles | 4 (admin, customer, demo, accountant) |
| Auth Method | REST API (JWT) |
| Container | juice-shop |
| Pentest Flags | --fast |
OWASP Category Breakdown¶
| Category | Count |
|---|---|
| A01 - Broken Access Control | 12 |
| A02 - Cryptographic Failures | 4 |
| A03 - Injection | 12 |
| A04 - Insecure Design | 4 |
| A05 - Security Misconfiguration | 6 |
| A06 - Vulnerable Components | 3 |
| A07 - Auth Failures | 5 |
| A08 - Data Integrity Failures | 3 |
| A09 - Logging Failures | 2 |
| A10 - SSRF | 2 |
| XSS - Cross-Site Scripting | 2 |
Challenge status API
Use /api/v1/challenges to check solve status for all 111 challenges.
SuperSecureBank¶
A .NET 8 banking application focused on financial security patterns. Tests cover transaction manipulation, authentication bypass, and .NET-specific vulnerabilities.
| Property | Value |
|---|---|
| Target | http://supersecurebank.test:45127/ |
| Tech Stack | .NET 8, SQL Server |
| Vulnerabilities | 37 |
| Roles | 5 (admin + 4 users) |
| Auth Method | MVC Form Login |
| Containers | supersecure-db, supersecure-be, supersecure-fe |
| Pentest Flags | --fast |
AltoroMutual¶
A Spring Boot + React banking application. The React SPA frontend exercises the suite's JavaScript analysis and SPA crawling capabilities. Uses TLS with a proxy configuration.
| Property | Value |
|---|---|
| Target | https://altoromutual.test:8443/ |
| Tech Stack | Spring Boot, React, PostgreSQL |
| Vulnerabilities | 29 |
| Roles | 5 (admin + 4 customers) |
| Auth Method | JWT API |
| Containers | altoro-postgres, altoro-app |
| Pentest Flags | --fast --proxy 127.0.0.1:9000 |
TLS configuration
AltoroMutual uses HTTPS with a self-signed certificate. The --proxy flag routes traffic through a local proxy that handles TLS verification.
DVWA¶
Classic Damn Vulnerable Web Application. PHP + MariaDB with 4 configurable security levels (low, medium, high, impossible). Eval runs on low level.
| Property | Value |
|---|---|
| Target | http://dvwa.test:4280/ |
| Tech Stack | PHP, MariaDB |
| Vulnerabilities | 28 |
| Roles | 5 (admin + 4 users) |
| Auth Method | PHP Session Form |
| Containers | dvwa-dvwa-1, dvwa-db-1 |
| Pentest Flags | --fast |
| Security Level | low (configurable via DEFAULT_SECURITY_LEVEL) |
OWASP Category Breakdown¶
| Category | Count |
|---|---|
| A01 - Broken Access Control | 3 |
| A02 - Cryptographic Failures | 2 |
| A03 - Injection | 10 |
| A04 - Insecure Design | 2 |
| A05 - Security Misconfiguration | 4 |
| A07 - Auth Failures | 3 |
| A08 - Data Integrity Failures | 2 |
| A09 - Logging Failures | 1 |
| X - Client-Side | 1 |
First-run setup required
DVWA requires /setup.php to create the database before first use. The registry setup_commands handle this automatically via /labs-up.
DVRA¶
Damn Vulnerable RESTaurant API Game. A pure REST API target (no web UI) built with FastAPI + MongoDB. Tests API-specific vulnerabilities including authentication, injection, and access control.
| Property | Value |
|---|---|
| Target | http://dvra.test:8091/ |
| Tech Stack | FastAPI (Python), MongoDB |
| Vulnerabilities | 12 |
| Roles | 2 types (3 customers + 2 employees) |
| Auth Method | OAuth2 Token (FastAPI) |
| Containers | web, db |
| Pentest Flags | --fast |
OWASP Category Breakdown¶
| Category | Count |
|---|---|
| A01 - Broken Access Control | 3 |
| A02 - Cryptographic Failures | 1 |
| A03 - Injection | 1 |
| A04 - Insecure Design | 2 |
| A05 - Security Misconfiguration | 2 |
| A07 - Auth Failures | 2 |
| A10 - SSRF | 1 |
XBOW¶
XBOW Validation Benchmarks: 104 independent CTF challenges, each running as a separate Docker Compose stack. Flag-based scoring (FLAG{...}) instead of answer-key matching.
| Property | Value |
|---|---|
| Type | CTF collection (104 challenges) |
| Port Range | 10001-10104 (one per benchmark) |
| Flag Format | FLAG{...} (SHA256-based, generated by common.mk) |
| Auth | None (per-challenge) |
| Pentest Flags | --fast |
Difficulty Distribution¶
| Level | Count | Description |
|---|---|---|
| Level 1 | 45 | Entry-level |
| Level 2 | 51 | Intermediate |
| Level 3 | 8 | Advanced |
Running XBOW benchmarks¶
XBOW uses dedicated tooling instead of the standard /labs-up + /pentest --eval flow:
# Single benchmark
/pentest-xbow benchmark-name
# Batch by level
/pentest-xbow --level 1
# Batch by tag
/pentest-xbow --tag sqli
# All benchmarks
/pentest-xbow --all
Each benchmark is launched individually via xbow-launcher.py, pentested with /pentest, and scored by xbow-scorer.py using flag comparison.
Lab Management¶
Registry¶
All labs are defined in evals/labs/registry.json. Each entry specifies:
- Docker configuration (profile, build, setup commands)
- URLs and health check parameters
- Authentication methods and credentials
- Container names for monitoring
- Path to
lab-config.jsonfor eval scoring
Adding a New Lab¶
The /labs-add skill auto-detects Docker config, credentials, auth methods, port assignments, and updates the registry + hosts file.
Starting Labs¶
Startup process: read registry, docker compose up -d, run setup commands, poll health endpoints, verify ALL credentials against each auth method, report Docker performance.
HackBench¶
Real-world CVE-based CTF challenges from ElectrovoltSec. 16 independent web exploitation challenges, each running as a separate Docker Compose stack. Tests exploit discovery across SQLi, XSS, auth bypass, IDOR, RCE, and n-day vulnerability identification.
| Property | Value |
|---|---|
| Type | CTF collection (16 challenges) |
| Port Range | 10201-10216 (one per challenge) |
| Flag Format | ev{hex} (runtime-generated unique flags) + alert() for XSS challenges |
| Auth | Per-challenge (varies: form login, API, onboarding wizard) |
| Source | ElectrovoltSec/HackBench |
| Launcher | evals/labs/hackbench/hackbench-launcher.py |
| Scorer | evals/labs/hackbench/hackbench-scorer.py |
Challenge Overview¶
| Difficulty | Count | Points Each | Total Points |
|---|---|---|---|
| Easy | 9 | 100 | 900 |
| Medium | 2 | 300 | 600 |
| Hard | 5 | 500 | 2500 |
| Total | 16 | 4000 |
Exploit Categories Covered¶
| Category | Challenges | Skills Tested |
|---|---|---|
| SQL Injection (UNION, Blind) | 2 | test-injection |
| NoSQL Injection | 1 | test-injection |
| Stored / DOM XSS | 4 | test-injection |
| JWT / Auth Bypass | 3 | test-auth |
| IDOR / BOLA | 2 | test-access |
| Command Injection / RCE | 2 | test-injection |
| Known CVE / N-day | 2 | CVE search + test-injection |
Running HackBench¶
# Single challenge
/pentest-hackbench EV-01
# By difficulty
/pentest-hackbench --difficulty easy
# All challenges
/pentest-hackbench --all
Each challenge is launched via hackbench-launcher.py, pentested with /pentest --eval, and scored by hackbench-scorer.py using flag comparison (string match for flag challenges, alert() detection for XSS challenges).
Infrastructure notes
- Some challenges require onboarding/setup via browser before API testing is possible
- XSS challenges (EV-09, EV-10, EV-11) use
alert(document.domain)as win condition, not string flags - Runtime flags are stored in
runtime-flags.json— scorer reads this for validation - Docker Compose port merging requires patched + override file pattern (handled by launcher)
HackMerlin¶
HackMerlin: 7-level progressive LLM prompt injection challenge. Each level adds stronger defenses (input filter, output filter, LLM-as-judge, active deception). Hosted target at hackmerlin.io (no Docker required).
| Property | Value |
|---|---|
| Type | LLM prompt injection (7 levels) |
| Target | https://hackmerlin.io/ |
| Docker | Not required (hosted) |
| Scoring | Level completion (password extraction) |
| Skills Tested | test-llm, prompt engineering |
| Best Score | 7/7 (100%) — 2026-03-20 |
Defense Layers¶
| Level | Defenses |
|---|---|
| L1-L3 | None → persona → basic output filter |
| L4-L5 | Input filter + output filter |
| L6 | Complex output filter (reversed + case-insensitive) |
| L7 | Input filter + output filter + LLM-as-judge + active deception layer |
Running HackMerlin¶
Gandalf¶
Gandalf by Lakera: 8-level progressive LLM prompt injection challenge. Each level adds stronger input/output filters. Hosted target (no Docker needed). Tests prompt injection, system prompt extraction, and filter bypass techniques.
| Property | Value |
|---|---|
| Type | LLM prompt injection (8 levels) |
| Target | External hosted (Lakera) |
| Docker | Not required |
| Scoring | Level completion (password extraction) |
| Skills Tested | test-llm, prompt engineering |
Running Gandalf¶
PortSwigger Academy¶
PortSwigger Web Security Academy: 250+ labs across all web security categories. Playwright-driven lab launcher creates ephemeral instances on PortSwigger's infrastructure. Each lab has a specific vulnerability to exploit and a "solved" status.
| Property | Value |
|---|---|
| Type | Web security labs (250+ labs) |
| Target | External hosted (PortSwigger) |
| Docker | Not required |
| Scoring | Lab completion (auto-detected by PortSwigger) |
| Categories | SQLi, XSS, CSRF, CORS, clickjacking, DOM-based, SSRF, XXE, OS command injection, directory traversal, access control, auth, business logic, HTTP request smuggling, WebSockets, deserialization, information disclosure, race conditions, prototype pollution, GraphQL, JWT, OAuth, SSTI, web cache poisoning |
Difficulty Tiers¶
| Tier | Description |
|---|---|
| Apprentice | Guided, single-step exploits |
| Practitioner | Multi-step, real-world scenarios |
| Expert | Advanced, chained exploits |
Running PortSwigger Labs¶
/pentest-portswigger # All labs
/pentest-portswigger --category sql-injection # Single category
/pentest-portswigger --difficulty practitioner # By difficulty
/pentest-portswigger --batch 10 # First 10
VibeApps (Neo Benchmark)¶
Three AI-generated web applications from ProjectDiscovery's Vibe-Coding research. Benchmark for comparing AI security scanners against Neo (ProjectDiscovery's AI scanner). 74 confirmed vulnerabilities across 3 apps, each built with a different AI coding tool and tech stack.
| App | Domain | Stack | Built with | LOC | Vulns | Port |
|---|---|---|---|---|---|---|
| VaultBank | Banking | React 18, FastAPI, SQLAlchemy, JWT, PostgreSQL | Claude Code (Sonnet 4.6) | 10,470 | 30 | 8101 |
| MedPortal | Healthcare | Next.js 14, Prisma, PostgreSQL, NextAuth.js | Codex (gpt-5-codex) | 4,528 | 20 | 8102 |
| ClaimFlow | Insurance | SvelteKit, Drizzle ORM, SQLite, Custom Auth | Cursor | 12,368 | 24 | 8103 |
Roles per App¶
| App | Roles (5 each) |
|---|---|
| VaultBank | Admin, Branch Manager, Compliance Officer, Teller, Customer |
| MedPortal | Admin, Doctor, Nurse, Lab Technician, Patient |
| ClaimFlow | Admin, Underwriter, Adjuster, Agent/Broker, Policyholder |
Vulnerability Distribution¶
| Severity | VaultBank | MedPortal | ClaimFlow | Total |
|---|---|---|---|---|
| Critical | 6 | 0 | 2 | 8 |
| High | 3 | 6 | 4 | 13 |
| Medium | 6 | 1 | 9 | 16 |
| Low | 13 | 7 | 5 | 25 |
| Info | 2 | 6 | 4 | 12 |
| Total | 30 | 20 | 24 | 74 |
Neo Baseline (ProjectDiscovery)¶
| Metric | Neo | Claude (PD) | Snyk | Invicti |
|---|---|---|---|---|
| True Positives | 66/74 | 41/74 | 0/74 | 10/74 |
| False Positives | 5 | 24 | 5 | 10 |
| Precision | 93% | 63% | 0% | 50% |
| Critical+High | 21/21 | 13/21 | 0/21 | 0/21 |
BeDefended Results (2026-03-24)¶
| Metric | BeDefended (blind) | Neo | Delta |
|---|---|---|---|
| True Positives | 61/74 | 66/74 | -5 |
| False Positives | 8 | 5 | +3 |
| Precision | 88.4% | 93.0% | -4.6pp |
| Extra vulns (outside 74) | 8 | 0 | +8 |
| Total real vulns found | 69 | 66 | +3 |
Per-app breakdown (first blind run):
| App | BeDefended | Neo | Delta |
|---|---|---|---|
| VaultBank | 23/30 | 27/30 | -4 |
| MedPortal | 17/20 | 17/20 | 0 (tied) |
| ClaimFlow | 21/24 | 22/24 | -1 |
Key Vulnerability Categories Found¶
| Category | Examples |
|---|---|
| Business Logic | Self-deposit money creation, dispute refund bypass, race condition double-spend, unlimited loan amounts |
| Broken Access Control | IDOR on patient records, cross-user dispute filing, manager cross-branch freeze, body-param IDOR |
| Authentication | Hardcoded JWT secret, JWT reuse after logout, weak password policy, no account lockout |
| Mass Assignment | Prisma/Drizzle raw body to ORM update, role escalation via user update |
| Information Disclosure | Password hash exposure via ORM, staff user IDs in responses, server version |
| Cryptographic Failures | SHA-256 with hardcoded salt, missing HSTS |
| File Upload | Unrestricted MIME types on dispute evidence and message attachments |
Running the Benchmark¶
/pentest-neo --all # All 3 apps sequentially
/pentest-neo vaultbank # Single app
/pentest-neo vaultbank --code-only # White-box only
/pentest-neo vaultbank --dynamic-only # Black-box only
Scoring¶
Scorer features: App filtering by App: tag, global optimal matching (score matrix), CWE family matching, stem-aware keyword matching. Compares vs Neo baseline automatically.