Skip to content

Lab Catalog

10 registered lab targets covering different tech stacks, vulnerability profiles, and scoring modes.


Summary

Lab Tech Stack Vulns Auth Scoring Difficulty
VulnHR Laravel/PHP, MySQL, LDAP 81 Sanctum + LDAP + Form Answer key ~30 easy / ~30 medium / ~20 hard
Juice Shop Express.js, Angular, SQLite 55 JWT REST API Answer key Mixed (6 difficulty tiers)
SuperSecureBank .NET 8, SQL Server 37 MVC Form Login Answer key Mixed
AltoroMutual Spring Boot, React, PostgreSQL 29 JWT API Answer key Mixed
DVWA PHP, MariaDB 28 PHP Session Form Answer key 4 security levels
DVRA FastAPI, MongoDB 12 OAuth2 Token Answer key Mixed
XBOW Mixed (104 Docker containers) 104 None Flag-based (FLAG{...}) L1: 45, L2: 51, L3: 8
HackBench Mixed (16 real-world CVE stacks) 16 Per-challenge Flag-based (ev{...}) + XSS alert 9 easy, 2 medium, 5 hard
Gandalf (Lakera) LLM Prompt Injection 8 None Level completion Progressive (8 levels)
HackMerlin LLM Prompt Injection 7 None Level completion Progressive (7 levels)
PortSwigger Academy Web Security (all categories) 250+ None Lab completion Apprentice / Practitioner / Expert
VibeApps (Neo) 3 AI-generated apps (see below) 74 JWT / NextAuth / Custom Session Answer key + Neo baseline Mixed (8C, 13H, 16M, 25L, 12I)

Total: 676+ distinct vulnerabilities/challenges across 14 targets.


VulnHR

The largest and most comprehensive lab target. HR Portal for a fictional company (Meridian Solutions). Covers 12 OWASP categories plus business logic and extra vulnerability classes.

Property Value
Target http://vulnhr.test:7331/
Tech Stack Laravel/PHP, MySQL, Redis, Nginx, OpenLDAP
Vulnerabilities 81
Roles 6 (admin, hr_manager, hr_specialist, manager, employee, observer)
Auth Methods REST API (Sanctum), LDAP form, Web form
Containers hrportal-nginx, hrportal-php, hrportal-postgres, hrportal-redis, hrportal-ldap
Pentest Flags --fast

OWASP Category Breakdown

Category Count
A01 - Broken Access Control 12
A02 - Cryptographic Failures 3
A03 - Injection 13
A04 - Insecure Design 5
A05 - Security Misconfiguration 9
A06 - Vulnerable Components 2
A07 - Auth Failures 5
A08 - Data Integrity 3
A09 - Logging Failures 2
A10 - SSRF 2
X - Extra 13
BL - Business Logic 12

Difficulty Distribution

  • Easy (~30 vulns): DAST-detectable, standard scanner coverage
  • Medium (~30 vulns): Needs specific configuration or partial manual analysis
  • Hard (~20 vulns): Requires deep manual analysis or exploit chains

Juice Shop

OWASP's flagship vulnerable web application. Express.js + Angular SPA with 111 built-in challenges; 55 are testable via automated pentest (excludes coding challenges, tutorial-only, and UI-puzzle challenges).

Property Value
Target http://juiceshop.test:3000/
Tech Stack Express.js, Angular, SQLite
Vulnerabilities 55 (of 111 total challenges)
Roles 4 (admin, customer, demo, accountant)
Auth Method REST API (JWT)
Container juice-shop
Pentest Flags --fast

OWASP Category Breakdown

Category Count
A01 - Broken Access Control 12
A02 - Cryptographic Failures 4
A03 - Injection 12
A04 - Insecure Design 4
A05 - Security Misconfiguration 6
A06 - Vulnerable Components 3
A07 - Auth Failures 5
A08 - Data Integrity Failures 3
A09 - Logging Failures 2
A10 - SSRF 2
XSS - Cross-Site Scripting 2

Challenge status API

Use /api/v1/challenges to check solve status for all 111 challenges.


SuperSecureBank

A .NET 8 banking application focused on financial security patterns. Tests cover transaction manipulation, authentication bypass, and .NET-specific vulnerabilities.

Property Value
Target http://supersecurebank.test:45127/
Tech Stack .NET 8, SQL Server
Vulnerabilities 37
Roles 5 (admin + 4 users)
Auth Method MVC Form Login
Containers supersecure-db, supersecure-be, supersecure-fe
Pentest Flags --fast

AltoroMutual

A Spring Boot + React banking application. The React SPA frontend exercises the suite's JavaScript analysis and SPA crawling capabilities. Uses TLS with a proxy configuration.

Property Value
Target https://altoromutual.test:8443/
Tech Stack Spring Boot, React, PostgreSQL
Vulnerabilities 29
Roles 5 (admin + 4 customers)
Auth Method JWT API
Containers altoro-postgres, altoro-app
Pentest Flags --fast --proxy 127.0.0.1:9000

TLS configuration

AltoroMutual uses HTTPS with a self-signed certificate. The --proxy flag routes traffic through a local proxy that handles TLS verification.


DVWA

Classic Damn Vulnerable Web Application. PHP + MariaDB with 4 configurable security levels (low, medium, high, impossible). Eval runs on low level.

Property Value
Target http://dvwa.test:4280/
Tech Stack PHP, MariaDB
Vulnerabilities 28
Roles 5 (admin + 4 users)
Auth Method PHP Session Form
Containers dvwa-dvwa-1, dvwa-db-1
Pentest Flags --fast
Security Level low (configurable via DEFAULT_SECURITY_LEVEL)

OWASP Category Breakdown

Category Count
A01 - Broken Access Control 3
A02 - Cryptographic Failures 2
A03 - Injection 10
A04 - Insecure Design 2
A05 - Security Misconfiguration 4
A07 - Auth Failures 3
A08 - Data Integrity Failures 2
A09 - Logging Failures 1
X - Client-Side 1

First-run setup required

DVWA requires /setup.php to create the database before first use. The registry setup_commands handle this automatically via /labs-up.


DVRA

Damn Vulnerable RESTaurant API Game. A pure REST API target (no web UI) built with FastAPI + MongoDB. Tests API-specific vulnerabilities including authentication, injection, and access control.

Property Value
Target http://dvra.test:8091/
Tech Stack FastAPI (Python), MongoDB
Vulnerabilities 12
Roles 2 types (3 customers + 2 employees)
Auth Method OAuth2 Token (FastAPI)
Containers web, db
Pentest Flags --fast

OWASP Category Breakdown

Category Count
A01 - Broken Access Control 3
A02 - Cryptographic Failures 1
A03 - Injection 1
A04 - Insecure Design 2
A05 - Security Misconfiguration 2
A07 - Auth Failures 2
A10 - SSRF 1

XBOW

XBOW Validation Benchmarks: 104 independent CTF challenges, each running as a separate Docker Compose stack. Flag-based scoring (FLAG{...}) instead of answer-key matching.

Property Value
Type CTF collection (104 challenges)
Port Range 10001-10104 (one per benchmark)
Flag Format FLAG{...} (SHA256-based, generated by common.mk)
Auth None (per-challenge)
Pentest Flags --fast

Difficulty Distribution

Level Count Description
Level 1 45 Entry-level
Level 2 51 Intermediate
Level 3 8 Advanced

Running XBOW benchmarks

XBOW uses dedicated tooling instead of the standard /labs-up + /pentest --eval flow:

# Single benchmark
/pentest-xbow benchmark-name

# Batch by level
/pentest-xbow --level 1

# Batch by tag
/pentest-xbow --tag sqli

# All benchmarks
/pentest-xbow --all

Each benchmark is launched individually via xbow-launcher.py, pentested with /pentest, and scored by xbow-scorer.py using flag comparison.


Lab Management

Registry

All labs are defined in evals/labs/registry.json. Each entry specifies:

  • Docker configuration (profile, build, setup commands)
  • URLs and health check parameters
  • Authentication methods and credentials
  • Container names for monitoring
  • Path to lab-config.json for eval scoring

Adding a New Lab

/labs-add https://github.com/org/vuln-app
/labs-add https://github.com/org/vuln-app custom-id

The /labs-add skill auto-detects Docker config, credentials, auth methods, port assignments, and updates the registry + hosts file.

Starting Labs

/labs-up                    # Start all
/labs-up --only vulnhr      # Single lab
/labs-up --rebuild          # Force rebuild

Startup process: read registry, docker compose up -d, run setup commands, poll health endpoints, verify ALL credentials against each auth method, report Docker performance.


HackBench

Real-world CVE-based CTF challenges from ElectrovoltSec. 16 independent web exploitation challenges, each running as a separate Docker Compose stack. Tests exploit discovery across SQLi, XSS, auth bypass, IDOR, RCE, and n-day vulnerability identification.

Property Value
Type CTF collection (16 challenges)
Port Range 10201-10216 (one per challenge)
Flag Format ev{hex} (runtime-generated unique flags) + alert() for XSS challenges
Auth Per-challenge (varies: form login, API, onboarding wizard)
Source ElectrovoltSec/HackBench
Launcher evals/labs/hackbench/hackbench-launcher.py
Scorer evals/labs/hackbench/hackbench-scorer.py

Challenge Overview

Difficulty Count Points Each Total Points
Easy 9 100 900
Medium 2 300 600
Hard 5 500 2500
Total 16 4000

Exploit Categories Covered

Category Challenges Skills Tested
SQL Injection (UNION, Blind) 2 test-injection
NoSQL Injection 1 test-injection
Stored / DOM XSS 4 test-injection
JWT / Auth Bypass 3 test-auth
IDOR / BOLA 2 test-access
Command Injection / RCE 2 test-injection
Known CVE / N-day 2 CVE search + test-injection

Running HackBench

# Single challenge
/pentest-hackbench EV-01

# By difficulty
/pentest-hackbench --difficulty easy

# All challenges
/pentest-hackbench --all

Each challenge is launched via hackbench-launcher.py, pentested with /pentest --eval, and scored by hackbench-scorer.py using flag comparison (string match for flag challenges, alert() detection for XSS challenges).

Infrastructure notes

  • Some challenges require onboarding/setup via browser before API testing is possible
  • XSS challenges (EV-09, EV-10, EV-11) use alert(document.domain) as win condition, not string flags
  • Runtime flags are stored in runtime-flags.json — scorer reads this for validation
  • Docker Compose port merging requires patched + override file pattern (handled by launcher)

HackMerlin

HackMerlin: 7-level progressive LLM prompt injection challenge. Each level adds stronger defenses (input filter, output filter, LLM-as-judge, active deception). Hosted target at hackmerlin.io (no Docker required).

Property Value
Type LLM prompt injection (7 levels)
Target https://hackmerlin.io/
Docker Not required (hosted)
Scoring Level completion (password extraction)
Skills Tested test-llm, prompt engineering
Best Score 7/7 (100%) — 2026-03-20

Defense Layers

Level Defenses
L1-L3 None → persona → basic output filter
L4-L5 Input filter + output filter
L6 Complex output filter (reversed + case-insensitive)
L7 Input filter + output filter + LLM-as-judge + active deception layer

Running HackMerlin

/pentest-hackmerlin                # All 7 levels
/pentest-hackmerlin --level 7      # Start from level 7

Gandalf

Gandalf by Lakera: 8-level progressive LLM prompt injection challenge. Each level adds stronger input/output filters. Hosted target (no Docker needed). Tests prompt injection, system prompt extraction, and filter bypass techniques.

Property Value
Type LLM prompt injection (8 levels)
Target External hosted (Lakera)
Docker Not required
Scoring Level completion (password extraction)
Skills Tested test-llm, prompt engineering

Running Gandalf

/pentest-gandalf                # All 8 levels
/pentest-gandalf --level 5      # Start from level 5

PortSwigger Academy

PortSwigger Web Security Academy: 250+ labs across all web security categories. Playwright-driven lab launcher creates ephemeral instances on PortSwigger's infrastructure. Each lab has a specific vulnerability to exploit and a "solved" status.

Property Value
Type Web security labs (250+ labs)
Target External hosted (PortSwigger)
Docker Not required
Scoring Lab completion (auto-detected by PortSwigger)
Categories SQLi, XSS, CSRF, CORS, clickjacking, DOM-based, SSRF, XXE, OS command injection, directory traversal, access control, auth, business logic, HTTP request smuggling, WebSockets, deserialization, information disclosure, race conditions, prototype pollution, GraphQL, JWT, OAuth, SSTI, web cache poisoning

Difficulty Tiers

Tier Description
Apprentice Guided, single-step exploits
Practitioner Multi-step, real-world scenarios
Expert Advanced, chained exploits

Running PortSwigger Labs

/pentest-portswigger                           # All labs
/pentest-portswigger --category sql-injection   # Single category
/pentest-portswigger --difficulty practitioner   # By difficulty
/pentest-portswigger --batch 10                 # First 10

VibeApps (Neo Benchmark)

Three AI-generated web applications from ProjectDiscovery's Vibe-Coding research. Benchmark for comparing AI security scanners against Neo (ProjectDiscovery's AI scanner). 74 confirmed vulnerabilities across 3 apps, each built with a different AI coding tool and tech stack.

App Domain Stack Built with LOC Vulns Port
VaultBank Banking React 18, FastAPI, SQLAlchemy, JWT, PostgreSQL Claude Code (Sonnet 4.6) 10,470 30 8101
MedPortal Healthcare Next.js 14, Prisma, PostgreSQL, NextAuth.js Codex (gpt-5-codex) 4,528 20 8102
ClaimFlow Insurance SvelteKit, Drizzle ORM, SQLite, Custom Auth Cursor 12,368 24 8103

Roles per App

App Roles (5 each)
VaultBank Admin, Branch Manager, Compliance Officer, Teller, Customer
MedPortal Admin, Doctor, Nurse, Lab Technician, Patient
ClaimFlow Admin, Underwriter, Adjuster, Agent/Broker, Policyholder

Vulnerability Distribution

Severity VaultBank MedPortal ClaimFlow Total
Critical 6 0 2 8
High 3 6 4 13
Medium 6 1 9 16
Low 13 7 5 25
Info 2 6 4 12
Total 30 20 24 74

Neo Baseline (ProjectDiscovery)

Metric Neo Claude (PD) Snyk Invicti
True Positives 66/74 41/74 0/74 10/74
False Positives 5 24 5 10
Precision 93% 63% 0% 50%
Critical+High 21/21 13/21 0/21 0/21

BeDefended Results (2026-03-24)

Metric BeDefended (blind) Neo Delta
True Positives 61/74 66/74 -5
False Positives 8 5 +3
Precision 88.4% 93.0% -4.6pp
Extra vulns (outside 74) 8 0 +8
Total real vulns found 69 66 +3

Per-app breakdown (first blind run):

App BeDefended Neo Delta
VaultBank 23/30 27/30 -4
MedPortal 17/20 17/20 0 (tied)
ClaimFlow 21/24 22/24 -1

Key Vulnerability Categories Found

Category Examples
Business Logic Self-deposit money creation, dispute refund bypass, race condition double-spend, unlimited loan amounts
Broken Access Control IDOR on patient records, cross-user dispute filing, manager cross-branch freeze, body-param IDOR
Authentication Hardcoded JWT secret, JWT reuse after logout, weak password policy, no account lockout
Mass Assignment Prisma/Drizzle raw body to ORM update, role escalation via user update
Information Disclosure Password hash exposure via ORM, staff user IDs in responses, server version
Cryptographic Failures SHA-256 with hardcoded salt, missing HSTS
File Upload Unrestricted MIME types on dispute evidence and message attachments

Running the Benchmark

/pentest-neo --all                    # All 3 apps sequentially
/pentest-neo vaultbank                # Single app
/pentest-neo vaultbank --code-only    # White-box only
/pentest-neo vaultbank --dynamic-only # Black-box only

Scoring

python evals/labs/vibeapps-scorer.py engagements/<dir> --app all --html --save

Scorer features: App filtering by App: tag, global optimal matching (score matrix), CWE family matching, stem-aware keyword matching. Compares vs Neo baseline automatically.