Skip to content

Latest Scores

Risultati piu recenti per ogni lab di valutazione. Questa pagina viene rigenerata automaticamente da evals/generate-scores-page.py dopo ogni sessione di scoring.

Stato: 7/13 lab con risultati

Ultimo aggiornamento pagina: 2026-03-25 00:00 UTC


Scoreboard

Lab Tech Stack Vulns Score Found Partial Missed Durata Data
VulnHR Laravel/PHP, MySQL 87 100.0% (87/87) 87 0 0 --- 2026-03-23 11:34
Juice Shop Express.js, Angular 55 --- --- --- --- --- Nessun run
SuperSecureBank .NET 8, SQL Server 47 91.9% (31/37) 31 6 0 --- 2026-03-23 15:07
AltoroMutual Spring Boot, React 60 40.5% (23.5/58) 19 9 30 --- 2026-03-23 16:20
DVWA PHP, MariaDB 28 --- --- --- --- --- Nessun run
DVRA FastAPI, MongoDB 12 --- --- --- --- --- Nessun run
VaultBank (Neo) React 18 + FastAPI 30 76.7% (23/30) 23 0 7 --- 2026-03-24
MedPortal (Neo) Next.js 14 + Prisma 20 85.0% (17/20) 17 0 3 --- 2026-03-24
ClaimFlow (Neo) SvelteKit + Drizzle 24 87.5% (21/24) 21 0 3 --- 2026-03-24
XBOW CTF Mixed (104 containers) 104 0.0% (0/104) 0 0 104 --- 2026-03-16 11:00
Gandalf (Lakera) LLM Prompt Injection 8 --- --- --- --- --- Nessun run
PortSwigger Academy Web Security (all categories) 250 --- --- --- --- --- Nessun run
Root-Me Web-Server + Web-Client 0 --- --- --- --- --- Nessun run

XBOW CTF

Score: 0/104 = 0.0% | Found: 0 | Partial: 0 | Missed: 104 | Extra: 0 | Findings totali: 0

Per skill (gap analysis)

Skill Totale Found Partial Missed Coverage
test-access 21 0 0 21 0%
test-api 1 0 0 1 0%
test-auth 4 0 0 4 0%
test-crypto 1 0 0 1 0%
test-deser 5 0 0 5 0%
test-exceptions 2 0 0 2 0%
test-infra 4 0 0 4 0%
test-injection 61 0 0 61 0%
test-logic 2 0 0 2 0%
test-ssrf 3 0 0 3 0%

Storico

Tutti i run salvati con --save. Ogni riga rappresenta un'esecuzione completa di /pentest --eval.

VulnHR

Data Score Found Partial Missed Durata Token
2026-03-23 11:34 100.0% (87/87) 87 0 0 --- ---

SuperSecureBank (.NET 8)

Data Score Found Partial Missed Durata Token
2026-03-23 15:07 91.9% (31/37) 31 6 0 --- ---

AltoroMutual

Data Score Found Partial Missed Durata Token
2026-03-23 16:20 40.5% (19/58) 19 9 30 --- ---

VibeApps — Neo Benchmark (3 apps, 74 vulns)

Combined benchmark against ProjectDiscovery's Neo scanner. Scored per-app against 74-entry ground truth.

Data App Score Found Missed FP vs Neo
2026-03-24 VaultBank 76.7% (23/30) 23 7 2 Neo: 27/30
2026-03-24 MedPortal 85.0% (17/20) 17 3 3 Neo: 17/20 (tied)
2026-03-24 ClaimFlow 87.5% (21/24) 21 3 3 Neo: 22/24
2026-03-24 TOTAL 82.4% (61/74) 61 13 8 Neo: 66/74

Extra findings (real vulns outside the 74 ground truth): 8 additional vulnerabilities found by BeDefended but not in the benchmark. Total real vulns found: 69 (vs Neo: 66). Precision: 88.4% (vs Neo: 93%).

XBOW CTF

Data Score Found Partial Missed Durata Token
2026-03-16 11:00 0.0% (0/0) 0 0 104 --- ---
2026-03-16 10:56 0.0% (0/0) 0 0 104 --- ---
2026-03-16 10:24 84.6% (0/0) 88 0 16 --- ---
2026-03-16 10:22 13.5% (0/0) 14 0 90 --- ---
2026-03-15 19:07 1.0% (0/0) 1 0 103 --- ---
2026-03-15 19:06 0.0% (0/0) 0 0 104 --- ---
2026-03-15 09:37 10.0% (0/0) 1 0 9 --- ---

Come aggiornare

Questa pagina viene rigenerata automaticamente. Per aggiornare manualmente:

# 1. Esegui pentest su un lab
/pentest http://vulnhr.test:7331 --eval

# 2. Scoring con salvataggio storico
python evals/lab-scorer.py vulnhr engagements/vulnhr-7331 --save --html --narrative

# 3. Rigenera questa pagina
python evals/generate-scores-page.py

Il generatore viene anche invocato automaticamente da /labs-eval --results.