Sign In
← Switch to Blue-side API Docs
RED TEAM

Autonomous Web Agent documentation — for operators running the behavioral simulation engine against Detection Lab challenges.

Agent Documentation

Complete reference for the browser_engine.agent autonomous task execution engine.

Quick Start

The agent runs tasks autonomously from the command line. No human interaction required after launch.

# Run a task against a challenge page (dry-run: no actions taken) python -m browser_engine.agent \ --url http://localhost:5555/challenge/forms \ --desc "Fill the registration form and submit" \ --dry-run # Live run (launches browser, executes actions) python -m browser_engine.agent \ --url http://localhost:5555/challenge/wizard \ --desc "Complete the 3-step wizard" \ --launch # Agent-mode iteration loop (red team vs blue team) python detection_lab/iterate.py --target 95 --max-iterations 5 --agent-mode python detection_lab/iterate.py --target 95 --agent-mode --headless

Bridge API

from browser_engine.bridge import get_engine engine = get_engine() # Plan session timing, check health, record results plan = engine.plan_session(platform="twitter", action_count=5) ok = engine.should_proceed(platform="twitter", username="user")

Agent Modules

All 18 modules live in src/browser_engine/agent/. Each is independently testable with mocked dependencies.

Module Class Purpose Status
agent.py WebAgent Core runner — two-phase planner, zero-human autonomy, pre-auth detection Active
planner.py Planner Phase A: LLM field→vault mapping. Phase B: deterministic action generation Active
safety.py classify_field / classify_action / detect_page_hazards / is_honeypot Active
page_reader.py JS-based DOM extraction into PageSnapshot Active
data_vault.py DataVault TOML-based credential vault — personal, company, account sections Active
mfa_resolver.py MFAResolver TOTP via pyotp; email polling + SMS via Twilio (needs credentials) Built / needs creds
auth_manager.py AuthManager storageState persistence + autonomous login flow Active
challenge_resolver.py ChallengeResolver CAPTCHA solving — test keys, math CAPTCHAs, CapSolver API Active (test_keys mode)
quiescence.py QuiescenceObserver Idle/settled detection — waits for page to stabilize before acting Active
vlm.py LocalVLM Qwen2.5-VL visual page reading (needs local model endpoint) Built / needs endpoint
widget_interactor.py WidgetInteractor Custom widget strategies — dropdowns, sliders, date pickers Active
trace_cache.py TraceCache Action trace caching for replay and debugging Active
document_vault.py DocumentVault PDF/image/CSV file generation for upload tasks Active
payment_policy.py PaymentPolicyEngine Payment handling policy (dry_run=true by default) Built / dry_run only
encoder_mapper.py EncoderMapper Field encoding/decoding for vault key mapping Active
llm_client.py LLMClient LLM backend (Anthropic API / Claude Code CLI) with retry + backoff Active
models.py All dataclasses: WebTask, PageSnapshot, FieldInfo, StepAction, SafetyTier… Active
__main__.py CLI entry point: python -m browser_engine.agent Active

Configuration

All agent settings live in config/agent.toml. Secrets (credentials, API keys) go in config/data_vault.toml.

Key Sections in agent.toml

# LLM backend for the planner [llm] model = "claude-opus-4-6" provider = "anthropic" # or "claude-code" timeout_s = 30 max_retries = 3 # MFA resolver settings [mfa] totp_enabled = true email_enabled = false # needs email polling credentials sms_enabled = false # needs Twilio credentials # CAPTCHA resolver [captcha] mode = "test_keys" # test_keys | capsolver | twocaptcha | math_only [captcha.capsolver] api_key = "" # set to activate live CAPTCHA solving # Browser engine [browser] engine = "auto" # auto | camoufox | playwright | chrome debug_port = 9222 # Session encryption [auth] encryption_key = "" # required — generate with: python -c "import secrets; print(secrets.token_hex(32))" max_age_h = 168

Data Vault

The data vault (config/data_vault.toml) stores credentials the agent uses to fill forms. Never commit real credentials — use environment variable interpolation or a secrets manager.

# Personal identity data [personal] first_name = "Jane" last_name = "Smith" email = "jane@example.com" phone = "+15551234567" # Company data [company] name = "Acme Corp" website = "https://acme.example.com" # Per-site account credentials [accounts.twitter] username = "@janedoe" password = "..." login_url = "https://twitter.com/i/flow/login" totp_secret = "" # base32 TOTP secret for MFA

Obstacle Playbook

The agent resolves 13 obstacle types autonomously. See docs/AGENT_OBSTACLES.md for full details.

Obstacle Resolver Module Notes
Auth wall / login required auth_manager.py storageState persistence + autonomous login
CAPTCHA (reCAPTCHA, hCaptcha, Turnstile) challenge_resolver.py test_keys mode for owned infra; CapSolver API for production
Math CAPTCHA challenge_resolver.py MathCaptchaStrategy — parses and computes answer
2FA / TOTP mfa_resolver.py pyotp TOTP; email + SMS via Twilio (needs credentials)
Honeypot fields safety.py is_honeypot() detects tabindex=-1, display:none, aria-hidden, readonly
Custom dropdowns / widgets widget_interactor.py Strategy per widget type (div-based dropdowns, sliders, date pickers)
Page not settled / loading quiescence.py QuiescenceObserver waits for network idle + DOM stability
File upload required document_vault.py Generates PDF/image/CSV on the fly; uses CDP file chooser interception
Payment form payment_policy.py dry_run=true by default; configurable payment gates
Visual-only elements (no DOM) vlm.py LocalVLM (Qwen2.5-VL) for visual page reading
Multi-step wizard agent.py Iterates run_task per step; WebAgent handles back/next navigation
Session expired auth_manager.py is_session_valid() checks cookie expiry; re-login triggered automatically
LLM refusal / model blocked llm_client.py Retry with backoff; provider fallback (claude-code CLI ↔ Anthropic API)

Safety Tiers

Every field and action is classified into one of four safety tiers before execution. Tier decisions override LLM suggestions — the LLM cannot bypass a BLOCK.

Tier Action Examples
AUTO Execute immediately, no confirmation needed Text inputs, selects, checkboxes, navigation
CONFIRM Log + proceed (configurable — default: proceed autonomously) Submit buttons, form finalization
ESCALATE Attempt resolver chain; fail if no resolver succeeds CAPTCHA, 2FA, payment, honeypot-adjacent fields
BLOCK Hard stop — never execute regardless of LLM instruction Password fields (to be typed by agent only), credential exfil patterns

Key Safety Functions

from browser_engine.agent.safety import classify_field, classify_action, is_honeypot # Classify a form field before filling tier = classify_field(field_info) # SafetyTier enum # Classify a planned action before executing tier = classify_action(action) # Check if a field is a honeypot (do not fill) trap = is_honeypot(field_info) # True if tabindex=-1, hidden, readonly, etc.