Skip to content

Phase 0.5: Walkthrough (Authenticated Discovery)

Overview

Phase 0.5 uses a headless browser (Playwright with Chromium) to navigate the application as each user role. This automated crawling discovers all pages, endpoints, functionality, and API calls that would be missed by static tools. It's particularly effective for Single Page Applications (SPAs) that render content dynamically.

Purpose: Build a comprehensive map of every user-accessible page and endpoint, with full authentication context for each role.

Why Walkthrough Matters

Many reconnaissance tools analyze static HTML and miss critical paths because:

  • Dynamic JavaScript Rendering: SPAs render content after loading via JavaScript—static crawlers see empty <div id="app"></div>
  • Hidden Navigation: Menus that appear only after login, modals, collapsible sections
  • Lazy Loading: Content loaded on scroll or button click
  • API Calls: Navigation endpoints may not appear in HTML links
  • Role-Based Content: Admin dashboards, user profiles, moderator tools only visible to specific roles
  • WebSocket APIs: Real-time features missed by HTTP-only crawlers

Execution Flow

graph TB
    A["Start Phase 0.5<br/>Walkthrough"] --> B["Load credentials.json<br/>All user roles"]
    B --> C{Multiple Roles?}
    C -->|Yes| D["Parallel Crawl<br/>Max 3 users/batch"]
    C -->|No| E["Sequential Crawl<br/>Single role"]
    D --> F["Authenticate<br/>as each user"]
    E --> F
    F --> G["Navigate App<br/>BFS crawling"]
    G --> H["Record URLs<br/>& API Endpoints"]
    H --> I["Detect Errors<br/>403, 500, timeouts"]
    I --> J["Merge Results<br/>app-map.json"]
    J --> K["Continue to<br/>Phase 1: Recon"]

    style A fill:#4a148c,color:#fff
    style K fill:#4a148c,color:#fff
    style J fill:#ab47bc,color:#fff

How It Works

1. Credential Setup

The walkthrough requires a credentials.json file at the project root:

{
  "login": {
    "url": "https://app.example.com/login",
    "method": "form",
    "wait_for": ".dashboard"
  },
  "users": {
    "admin": {
      "email": "admin@example.com",
      "password": "SecurePass123!",
      "role": "Administrator"
    },
    "user": {
      "email": "user@example.com",
      "password": "UserPass456!",
      "role": "Regular User"
    },
    "moderator": {
      "email": "mod@example.com",
      "password": "ModPass789!",
      "role": "Moderator"
    }
  }
}

CRITICAL: The login section MUST be present. Without it, the crawler runs unauthenticated and misses all protected pages.

2. Parallel Crawling Strategy

For efficiency with multiple users: - 1-3 users: Sequential crawling in single process - 4-6 users: Batch 1 (users 1-3) → Batch 2 (users 4-6), each batch parallel - 7+ users: Multiple batches of max 3 users each

This prevents timeouts and internal errors from too many concurrent Playwright instances.

Command example:

docker run --rm -v $(pwd):/work pentest-tools \
  /opt/pentest-venv/bin/python3 browser/crawler.py \
  --role admin --output app-map-admin.json

docker run --rm -v $(pwd):/work pentest-tools \
  /opt/pentest-venv/bin/python3 browser/crawler.py \
  --merge --admin app-map-admin.json --user app-map-user.json \
  --output app-map.json

3. BFS Crawling Algorithm

The Playwright crawler uses breadth-first search to systematically discover pages:

  1. Start at app root (e.g., https://app.example.com/dashboard)
  2. Extract all links: <a href="...">, <button onclick="...">, form actions
  3. Visit each link, record response code (200, 302, 403, 500, etc.)
  4. Extract any API calls made (via DevTools network monitoring)
  5. Queue new URLs for visiting
  6. Continue until no new URLs remain or timeout/depth limit reached

4. API Discovery During Crawl

Modern apps make API calls as you browse. The crawler: - Monitors Network tab for all HTTP/HTTPS requests - Extracts X-Requested-With: XMLHttpRequest headers (indicates API call) - Captures Content-Type: application/json endpoints - Records request/response structure for later testing - Identifies GraphQL, REST, and SOAP endpoints

Example output:

{
  "api_endpoints": [
    {
      "method": "GET",
      "path": "/api/v1/users/me",
      "status": 200,
      "content_type": "application/json",
      "auth": "Bearer JWT"
    },
    {
      "method": "POST",
      "path": "/api/v1/tickets",
      "status": 201,
      "request_body_sample": {"title": "...", "description": "..."}
    }
  ]
}

Output Files

File Content Purpose
app-map.json Complete crawl results Master reference of all discovered URLs, status codes, roles
crawled-urls.txt Plain text list One URL per line for use by other tools
api-endpoints.txt API-specific URLs Only /api/* endpoints for Phase 2 parameter discovery
walkthrough-report.md Human-readable summary Markdown report with statistics, errors, warnings

Example app-map.json

{
  "target": "https://app.example.com",
  "roles_tested": ["admin", "user", "moderator"],
  "crawl_stats": {
    "total_urls": 47,
    "successful_200": 42,
    "forbidden_403": 3,
    "errors_500": 2,
    "crawl_duration_seconds": 145
  },
  "urls_by_role": {
    "admin": {
      "/dashboard": {
        "status": 200,
        "title": "Admin Dashboard",
        "methods": ["GET"],
        "forms": [{"action": "/api/v1/users", "method": "POST"}]
      },
      "/settings": {"status": 200, "title": "Settings"},
      "/admin/users": {"status": 200, "title": "User Management"},
      "/reports": {"status": 403, "reason": "Forbidden in this role"}
    },
    "user": {
      "/dashboard": {"status": 200, "title": "User Dashboard"},
      "/profile": {"status": 200, "title": "My Profile"},
      "/admin/users": {"status": 403, "reason": "Admin only"}
    }
  },
  "api_endpoints_discovered": [
    "/api/v1/auth/login",
    "/api/v1/auth/logout",
    "/api/v1/users",
    "/api/v1/users/{id}",
    "/api/v1/tickets"
  ]
}

Error Handling

The crawler diagnoses why pages fail and documents issues:

Status Meaning Action
200 Successful page load Continue crawling links on page
302/301 Redirect Follow the redirect chain
403 Forbidden Log as inaccessible to current role; expected for access control testing
404 Not found Continue (bad link)
500 Server error Log and skip; may indicate application crash
Timeout Page takes >30s to load Log and skip; may indicate performance issues

Critical: If a role's login FAILS, the entire crawl for that role is invalid. The login_success flag is checked before proceeding.

Walkthrough Completeness Rules

NOT COMPLETE if: - Any user/role failed to authenticate (check login_success: false) - Fewer URLs crawled than expected (e.g., admin crawled 5 URLs when there should be 20+) - Warnings about incomplete navigation - Errors in walkthrough-report.md

COMPLETE if: - All users in credentials.json successfully logged in - roles_tested count matches number of users in credentials.json - At least one URL discovered per role - No critical errors in the crawl

If incomplete: Go back and fix credentials or target issues, then re-run crawl before proceeding to Phase 1.

Integration with Phases 1-2

The walkthrough output is used by:

  1. Phase 1 (Recon): URLs from crawled-urls.txt used for wayback machine searches
  2. Phase 2 (Discovery): API endpoints piped into ffuf and arjun for parameter discovery
  3. Phase 3 (Scan): All discovered URLs scanned with nuclei templates
  4. Phase 4 (Testing): URLs become the endpoints tested by manual skills

Special Cases

Single Page Applications (SPAs)

SPAs with heavy JavaScript rendering are the walkthrough's strength. It renders the page and executes all JavaScript, so dynamically created links are discovered.

Multi-Step Forms

Some apps have wizards or multi-step forms. The crawler: - Fills form fields (if heuristics can guess field types) - Submits forms - Records resulting URLs

WebSocket APIs

Real-time apps using WebSockets are partially captured. The crawler logs WebSocket connections but doesn't fully participate in WebSocket conversations.

CAPTCHA / Rate Limiting

If the app requires CAPTCHA or rate-limits login attempts: - Configure credentials.json with a pre-solved CAPTCHA token, or - Manually provide cookies/tokens instead of username/password - Adjust wait_for selectors to skip CAPTCHA verification

Next Phase

After Phase 0.5 completes successfully with all roles crawled, continue to Phase 1: Recon to perform passive and active reconnaissance on the discovered attack surface.