Latest Scores¶
Risultati piu recenti per ogni lab di valutazione. Questa pagina viene rigenerata automaticamente da evals/generate-scores-page.py dopo ogni sessione di scoring.
Stato: 7/13 lab con risultati
Ultimo aggiornamento pagina: 2026-03-25 00:00 UTC
Scoreboard¶
| Lab | Tech Stack | Vulns | Score | Found | Partial | Missed | Durata | Data |
|---|---|---|---|---|---|---|---|---|
| VulnHR | Laravel/PHP, MySQL | 87 | 100.0% (87/87) | 87 | 0 | 0 | --- | 2026-03-23 11:34 |
| Juice Shop | Express.js, Angular | 55 | --- | --- | --- | --- | --- | Nessun run |
| SuperSecureBank | .NET 8, SQL Server | 47 | 91.9% (31/37) | 31 | 6 | 0 | --- | 2026-03-23 15:07 |
| AltoroMutual | Spring Boot, React | 60 | 40.5% (23.5/58) | 19 | 9 | 30 | --- | 2026-03-23 16:20 |
| DVWA | PHP, MariaDB | 28 | --- | --- | --- | --- | --- | Nessun run |
| DVRA | FastAPI, MongoDB | 12 | --- | --- | --- | --- | --- | Nessun run |
| VaultBank (Neo) | React 18 + FastAPI | 30 | 76.7% (23/30) | 23 | 0 | 7 | --- | 2026-03-24 |
| MedPortal (Neo) | Next.js 14 + Prisma | 20 | 85.0% (17/20) | 17 | 0 | 3 | --- | 2026-03-24 |
| ClaimFlow (Neo) | SvelteKit + Drizzle | 24 | 87.5% (21/24) | 21 | 0 | 3 | --- | 2026-03-24 |
| XBOW CTF | Mixed (104 containers) | 104 | 0.0% (0/104) | 0 | 0 | 104 | --- | 2026-03-16 11:00 |
| Gandalf (Lakera) | LLM Prompt Injection | 8 | --- | --- | --- | --- | --- | Nessun run |
| PortSwigger Academy | Web Security (all categories) | 250 | --- | --- | --- | --- | --- | Nessun run |
| Root-Me | Web-Server + Web-Client | 0 | --- | --- | --- | --- | --- | Nessun run |
XBOW CTF¶
Score: 0/104 = 0.0% | Found: 0 | Partial: 0 | Missed: 104 | Extra: 0 | Findings totali: 0
Per skill (gap analysis)¶
| Skill | Totale | Found | Partial | Missed | Coverage |
|---|---|---|---|---|---|
| test-access | 21 | 0 | 0 | 21 | 0% |
| test-api | 1 | 0 | 0 | 1 | 0% |
| test-auth | 4 | 0 | 0 | 4 | 0% |
| test-crypto | 1 | 0 | 0 | 1 | 0% |
| test-deser | 5 | 0 | 0 | 5 | 0% |
| test-exceptions | 2 | 0 | 0 | 2 | 0% |
| test-infra | 4 | 0 | 0 | 4 | 0% |
| test-injection | 61 | 0 | 0 | 61 | 0% |
| test-logic | 2 | 0 | 0 | 2 | 0% |
| test-ssrf | 3 | 0 | 0 | 3 | 0% |
Storico¶
Tutti i run salvati con --save. Ogni riga rappresenta un'esecuzione completa di /pentest --eval.
VulnHR¶
| Data | Score | Found | Partial | Missed | Durata | Token |
|---|---|---|---|---|---|---|
| 2026-03-23 11:34 | 100.0% (87/87) | 87 | 0 | 0 | --- | --- |
SuperSecureBank (.NET 8)¶
| Data | Score | Found | Partial | Missed | Durata | Token |
|---|---|---|---|---|---|---|
| 2026-03-23 15:07 | 91.9% (31/37) | 31 | 6 | 0 | --- | --- |
AltoroMutual¶
| Data | Score | Found | Partial | Missed | Durata | Token |
|---|---|---|---|---|---|---|
| 2026-03-23 16:20 | 40.5% (19/58) | 19 | 9 | 30 | --- | --- |
VibeApps — Neo Benchmark (3 apps, 74 vulns)¶
Combined benchmark against ProjectDiscovery's Neo scanner. Scored per-app against 74-entry ground truth.
| Data | App | Score | Found | Missed | FP | vs Neo |
|---|---|---|---|---|---|---|
| 2026-03-24 | VaultBank | 76.7% (23/30) | 23 | 7 | 2 | Neo: 27/30 |
| 2026-03-24 | MedPortal | 85.0% (17/20) | 17 | 3 | 3 | Neo: 17/20 (tied) |
| 2026-03-24 | ClaimFlow | 87.5% (21/24) | 21 | 3 | 3 | Neo: 22/24 |
| 2026-03-24 | TOTAL | 82.4% (61/74) | 61 | 13 | 8 | Neo: 66/74 |
Extra findings (real vulns outside the 74 ground truth): 8 additional vulnerabilities found by BeDefended but not in the benchmark. Total real vulns found: 69 (vs Neo: 66). Precision: 88.4% (vs Neo: 93%).
XBOW CTF¶
| Data | Score | Found | Partial | Missed | Durata | Token |
|---|---|---|---|---|---|---|
| 2026-03-16 11:00 | 0.0% (0/0) | 0 | 0 | 104 | --- | --- |
| 2026-03-16 10:56 | 0.0% (0/0) | 0 | 0 | 104 | --- | --- |
| 2026-03-16 10:24 | 84.6% (0/0) | 88 | 0 | 16 | --- | --- |
| 2026-03-16 10:22 | 13.5% (0/0) | 14 | 0 | 90 | --- | --- |
| 2026-03-15 19:07 | 1.0% (0/0) | 1 | 0 | 103 | --- | --- |
| 2026-03-15 19:06 | 0.0% (0/0) | 0 | 0 | 104 | --- | --- |
| 2026-03-15 09:37 | 10.0% (0/0) | 1 | 0 | 9 | --- | --- |
Come aggiornare¶
Questa pagina viene rigenerata automaticamente. Per aggiornare manualmente:
# 1. Esegui pentest su un lab
/pentest http://vulnhr.test:7331 --eval
# 2. Scoring con salvataggio storico
python evals/lab-scorer.py vulnhr engagements/vulnhr-7331 --save --html --narrative
# 3. Rigenera questa pagina
python evals/generate-scores-page.py
Il generatore viene anche invocato automaticamente da /labs-eval --results.