Phase 0.5: Walkthrough (Authenticated Discovery)¶

Overview¶

Phase 0.5 uses a headless browser (Playwright with Chromium) to navigate the application as each user role. This automated crawling discovers all pages, endpoints, functionality, and API calls that would be missed by static tools. It's particularly effective for Single Page Applications (SPAs) that render content dynamically.

Purpose: Build a comprehensive map of every user-accessible page and endpoint, with full authentication context for each role.

Why Walkthrough Matters¶

Many reconnaissance tools analyze static HTML and miss critical paths because:

Dynamic JavaScript Rendering: SPAs render content after loading via JavaScript—static crawlers see empty <div id="app"></div>
Hidden Navigation: Menus that appear only after login, modals, collapsible sections
Lazy Loading: Content loaded on scroll or button click
API Calls: Navigation endpoints may not appear in HTML links
Role-Based Content: Admin dashboards, user profiles, moderator tools only visible to specific roles
WebSocket APIs: Real-time features missed by HTTP-only crawlers

Execution Flow¶

graph TB
    A["Start Phase 0.5<br/>Walkthrough"] --> B["Load credentials.json<br/>All user roles"]
    B --> C{Multiple Roles?}
    C -->|Yes| D["Parallel Crawl<br/>Max 3 users/batch"]
    C -->|No| E["Sequential Crawl<br/>Single role"]
    D --> F["Authenticate<br/>as each user"]
    E --> F
    F --> G["Navigate App<br/>BFS crawling"]
    G --> H["Record URLs<br/>& API Endpoints"]
    H --> I["Detect Errors<br/>403, 500, timeouts"]
    I --> J["Merge Results<br/>app-map.json"]
    J --> K["Continue to<br/>Phase 1: Recon"]

    style A fill:#4a148c,color:#fff
    style K fill:#4a148c,color:#fff
    style J fill:#ab47bc,color:#fff

How It Works¶

1. Credential Setup¶

The walkthrough requires a credentials.json file at the project root:

{
  "login": {
    "url": "https://app.example.com/login",
    "method": "form",
    "wait_for": ".dashboard"
  },
  "users": {
    "admin": {
      "email": "admin@example.com",
      "password": "SecurePass123!",
      "role": "Administrator"
    },
    "user": {
      "email": "user@example.com",
      "password": "UserPass456!",
      "role": "Regular User"
    },
    "moderator": {
      "email": "mod@example.com",
      "password": "ModPass789!",
      "role": "Moderator"
    }
  }
}

CRITICAL: The login section MUST be present. Without it, the crawler runs unauthenticated and misses all protected pages.

2. Parallel Crawling Strategy¶

For efficiency with multiple users: - 1-3 users: Sequential crawling in single process - 4-6 users: Batch 1 (users 1-3) → Batch 2 (users 4-6), each batch parallel - 7+ users: Multiple batches of max 3 users each

This prevents timeouts and internal errors from too many concurrent Playwright instances.

Command example:

docker run --rm -v $(pwd):/work pentest-tools \
  /opt/pentest-venv/bin/python3 browser/crawler.py \
  --role admin --output app-map-admin.json

docker run --rm -v $(pwd):/work pentest-tools \
  /opt/pentest-venv/bin/python3 browser/crawler.py \
  --merge --admin app-map-admin.json --user app-map-user.json \
  --output app-map.json

3. BFS Crawling Algorithm¶

The Playwright crawler uses breadth-first search to systematically discover pages:

Start at app root (e.g., https://app.example.com/dashboard)
Extract all links: <a href="...">, <button onclick="...">, form actions
Visit each link, record response code (200, 302, 403, 500, etc.)
Extract any API calls made (via DevTools network monitoring)
Queue new URLs for visiting
Continue until no new URLs remain or timeout/depth limit reached

4. API Discovery During Crawl¶

Modern apps make API calls as you browse. The crawler: - Monitors Network tab for all HTTP/HTTPS requests - Extracts X-Requested-With: XMLHttpRequest headers (indicates API call) - Captures Content-Type: application/json endpoints - Records request/response structure for later testing - Identifies GraphQL, REST, and SOAP endpoints

Example output:

{
  "api_endpoints": [
    {
      "method": "GET",
      "path": "/api/v1/users/me",
      "status": 200,
      "content_type": "application/json",
      "auth": "Bearer JWT"
    },
    {
      "method": "POST",
      "path": "/api/v1/tickets",
      "status": 201,
      "request_body_sample": {"title": "...", "description": "..."}
    }
  ]
}

Output Files¶

File	Content	Purpose
`app-map.json`	Complete crawl results	Master reference of all discovered URLs, status codes, roles
`crawled-urls.txt`	Plain text list	One URL per line for use by other tools
`api-endpoints.txt`	API-specific URLs	Only `/api/*` endpoints for Phase 2 parameter discovery
`walkthrough-report.md`	Human-readable summary	Markdown report with statistics, errors, warnings

Example app-map.json¶

{
  "target": "https://app.example.com",
  "roles_tested": ["admin", "user", "moderator"],
  "crawl_stats": {
    "total_urls": 47,
    "successful_200": 42,
    "forbidden_403": 3,
    "errors_500": 2,
    "crawl_duration_seconds": 145
  },
  "urls_by_role": {
    "admin": {
      "/dashboard": {
        "status": 200,
        "title": "Admin Dashboard",
        "methods": ["GET"],
        "forms": [{"action": "/api/v1/users", "method": "POST"}]
      },
      "/settings": {"status": 200, "title": "Settings"},
      "/admin/users": {"status": 200, "title": "User Management"},
      "/reports": {"status": 403, "reason": "Forbidden in this role"}
    },
    "user": {
      "/dashboard": {"status": 200, "title": "User Dashboard"},
      "/profile": {"status": 200, "title": "My Profile"},
      "/admin/users": {"status": 403, "reason": "Admin only"}
    }
  },
  "api_endpoints_discovered": [
    "/api/v1/auth/login",
    "/api/v1/auth/logout",
    "/api/v1/users",
    "/api/v1/users/{id}",
    "/api/v1/tickets"
  ]
}

Error Handling¶

The crawler diagnoses why pages fail and documents issues:

Status	Meaning	Action
200	Successful page load	Continue crawling links on page
302/301	Redirect	Follow the redirect chain
403	Forbidden	Log as inaccessible to current role; expected for access control testing
404	Not found	Continue (bad link)
500	Server error	Log and skip; may indicate application crash
Timeout	Page takes >30s to load	Log and skip; may indicate performance issues

Critical: If a role's login FAILS, the entire crawl for that role is invalid. The login_success flag is checked before proceeding.

Walkthrough Completeness Rules¶

❌ NOT COMPLETE if: - Any user/role failed to authenticate (check login_success: false) - Fewer URLs crawled than expected (e.g., admin crawled 5 URLs when there should be 20+) - Warnings about incomplete navigation - Errors in walkthrough-report.md

✅ COMPLETE if: - All users in credentials.json successfully logged in - roles_tested count matches number of users in credentials.json - At least one URL discovered per role - No critical errors in the crawl

If incomplete: Go back and fix credentials or target issues, then re-run crawl before proceeding to Phase 1.

Integration with Phases 1-2¶

The walkthrough output is used by:

Phase 1 (Recon): URLs from crawled-urls.txt used for wayback machine searches
Phase 2 (Discovery): API endpoints piped into ffuf and arjun for parameter discovery
Phase 3 (Scan): All discovered URLs scanned with nuclei templates
Phase 4 (Testing): URLs become the endpoints tested by manual skills

Special Cases¶

Single Page Applications (SPAs)¶

SPAs with heavy JavaScript rendering are the walkthrough's strength. It renders the page and executes all JavaScript, so dynamically created links are discovered.

Multi-Step Forms¶

Some apps have wizards or multi-step forms. The crawler: - Fills form fields (if heuristics can guess field types) - Submits forms - Records resulting URLs

WebSocket APIs¶

Real-time apps using WebSockets are partially captured. The crawler logs WebSocket connections but doesn't fully participate in WebSocket conversations.

CAPTCHA / Rate Limiting¶

If the app requires CAPTCHA or rate-limits login attempts: - Configure credentials.json with a pre-solved CAPTCHA token, or - Manually provide cookies/tokens instead of username/password - Adjust wait_for selectors to skip CAPTCHA verification

Next Phase¶

After Phase 0.5 completes successfully with all roles crawled, continue to Phase 1: Recon to perform passive and active reconnaissance on the discovered attack surface.