Latest Scores¶

Risultati piu recenti per ogni lab di valutazione. Questa pagina viene rigenerata automaticamente da evals/generate-scores-page.py dopo ogni sessione di scoring.

Stato: 7/13 lab con risultati

Ultimo aggiornamento pagina: 2026-03-25 00:00 UTC

Scoreboard¶

Lab	Tech Stack	Vulns	Score	Found	Partial	Missed	Durata	Data
VulnHR	Laravel/PHP, MySQL	87	100.0% (87/87)	87	0	0	---	2026-03-23 11:34
Juice Shop	Express.js, Angular	55	---	---	---	---	---	Nessun run
SuperSecureBank	.NET 8, SQL Server	47	91.9% (31/37)	31	6	0	---	2026-03-23 15:07
AltoroMutual	Spring Boot, React	60	40.5% (23.5/58)	19	9	30	---	2026-03-23 16:20
DVWA	PHP, MariaDB	28	---	---	---	---	---	Nessun run
DVRA	FastAPI, MongoDB	12	---	---	---	---	---	Nessun run
VaultBank (Neo)	React 18 + FastAPI	30	76.7% (23/30)	23	0	7	---	2026-03-24
MedPortal (Neo)	Next.js 14 + Prisma	20	85.0% (17/20)	17	0	3	---	2026-03-24
ClaimFlow (Neo)	SvelteKit + Drizzle	24	87.5% (21/24)	21	0	3	---	2026-03-24
XBOW CTF	Mixed (104 containers)	104	0.0% (0/104)	0	0	104	---	2026-03-16 11:00
Gandalf (Lakera)	LLM Prompt Injection	8	---	---	---	---	---	Nessun run
PortSwigger Academy	Web Security (all categories)	250	---	---	---	---	---	Nessun run
Root-Me	Web-Server + Web-Client	0	---	---	---	---	---	Nessun run

XBOW CTF¶

Per skill (gap analysis)¶

Skill	Totale	Missed	Coverage
test-access	21	21	0%
test-api	1	1	0%
test-auth	4	4	0%
test-crypto	1	1	0%
test-deser	5	5	0%
test-exceptions	2	2	0%
test-infra	4	4	0%
test-injection	61	61	0%
test-logic	2	2	0%
test-ssrf	3	3	0%

Storico¶

Tutti i run salvati con --save. Ogni riga rappresenta un'esecuzione completa di /pentest --eval.

VulnHR¶

Data	Score	Found	Partial	Missed	Durata	Token
2026-03-23 11:34	100.0% (87/87)	87	0	0	---	---

SuperSecureBank (.NET 8)¶

Data	Score	Found	Partial	Missed	Durata	Token
2026-03-23 15:07	91.9% (31/37)	31	6	0	---	---

AltoroMutual¶

Data	Score	Found	Partial	Missed	Durata	Token
2026-03-23 16:20	40.5% (19/58)	19	9	30	---	---

VibeApps — Neo Benchmark (3 apps, 74 vulns)¶

Combined benchmark against ProjectDiscovery's Neo scanner. Scored per-app against 74-entry ground truth.

Data	App	Score	Found	Missed	FP	vs Neo
2026-03-24	VaultBank	76.7% (23/30)	23	7	2	Neo: 27/30
2026-03-24	MedPortal	85.0% (17/20)	17	3	3	Neo: 17/20 (tied)
2026-03-24	ClaimFlow	87.5% (21/24)	21	3	3	Neo: 22/24
2026-03-24	TOTAL	82.4% (61/74)	61	13	8	Neo: 66/74

Extra findings (real vulns outside the 74 ground truth): 8 additional vulnerabilities found by BeDefended but not in the benchmark. Total real vulns found: 69 (vs Neo: 66). Precision: 88.4% (vs Neo: 93%).

XBOW CTF¶

Data	Score	Found	Missed	Durata	Token
2026-03-16 11:00	0.0% (0/0)	0	104	---	---
2026-03-16 10:56	0.0% (0/0)	0	104	---	---
2026-03-16 10:24	84.6% (0/0)	88	16	---	---
2026-03-16 10:22	13.5% (0/0)	14	90	---	---
2026-03-15 19:07	1.0% (0/0)	1	103	---	---
2026-03-15 19:06	0.0% (0/0)	0	104	---	---
2026-03-15 09:37	10.0% (0/0)	1	9	---	---

Come aggiornare¶

Questa pagina viene rigenerata automaticamente. Per aggiornare manualmente:

# 1. Esegui pentest su un lab
/pentest http://vulnhr.test:7331 --eval

# 2. Scoring con salvataggio storico
python evals/lab-scorer.py vulnhr engagements/vulnhr-7331 --save --html --narrative

# 3. Rigenera questa pagina
python evals/generate-scores-page.py

Il generatore viene anche invocato automaticamente da /labs-eval --results.