Agent Documentation

Complete reference for the browser_engine.agent autonomous task execution engine.

Quick Start

The agent runs tasks autonomously from the command line. No human interaction required after launch.

# Run a task against a challenge page (dry-run: no actions taken)
python -m browser_engine.agent \
  --url http://localhost:5555/challenge/forms \
  --desc "Fill the registration form and submit" \
  --dry-run

# Live run (launches browser, executes actions)
python -m browser_engine.agent \
  --url http://localhost:5555/challenge/wizard \
  --desc "Complete the 3-step wizard" \
  --launch

# Agent-mode iteration loop (red team vs blue team)
python detection_lab/iterate.py --target 95 --max-iterations 5 --agent-mode
python detection_lab/iterate.py --target 95 --agent-mode --headless

Bridge API

from browser_engine.bridge import get_engine

engine = get_engine()
# Plan session timing, check health, record results
plan = engine.plan_session(platform="twitter", action_count=5)
ok   = engine.should_proceed(platform="twitter", username="user")

Agent Modules

All 18 modules live in src/browser_engine/agent/. Each is independently testable with mocked dependencies.

Module	Class	Purpose	Status
agent.py	`WebAgent`	Core runner — two-phase planner, zero-human autonomy, pre-auth detection	Active
planner.py	`Planner`	Phase A: LLM field→vault mapping. Phase B: deterministic action generation	Active
safety.py	—	classify_field / classify_action / detect_page_hazards / is_honeypot	Active
page_reader.py	—	JS-based DOM extraction into `PageSnapshot`	Active
data_vault.py	`DataVault`	TOML-based credential vault — personal, company, account sections	Active
mfa_resolver.py	`MFAResolver`	TOTP via pyotp; email polling + SMS via Twilio (needs credentials)	Built / needs creds
auth_manager.py	`AuthManager`	storageState persistence + autonomous login flow	Active
challenge_resolver.py	`ChallengeResolver`	CAPTCHA solving — test keys, math CAPTCHAs, CapSolver API	Active (test_keys mode)
quiescence.py	`QuiescenceObserver`	Idle/settled detection — waits for page to stabilize before acting	Active
vlm.py	`LocalVLM`	Qwen2.5-VL visual page reading (needs local model endpoint)	Built / needs endpoint
widget_interactor.py	`WidgetInteractor`	Custom widget strategies — dropdowns, sliders, date pickers	Active
trace_cache.py	`TraceCache`	Action trace caching for replay and debugging	Active
document_vault.py	`DocumentVault`	PDF/image/CSV file generation for upload tasks	Active
payment_policy.py	`PaymentPolicyEngine`	Payment handling policy (dry_run=true by default)	Built / dry_run only
encoder_mapper.py	`EncoderMapper`	Field encoding/decoding for vault key mapping	Active
llm_client.py	`LLMClient`	LLM backend (Anthropic API / Claude Code CLI) with retry + backoff	Active
models.py	—	All dataclasses: WebTask, PageSnapshot, FieldInfo, StepAction, SafetyTier…	Active
__main__.py	—	CLI entry point: `python -m browser_engine.agent`	Active

Configuration

All agent settings live in config/agent.toml. Secrets (credentials, API keys) go in config/data_vault.toml.

Key Sections in agent.toml

# LLM backend for the planner
[llm]
model = "claude-opus-4-6"
provider = "anthropic"     # or "claude-code"
timeout_s = 30
max_retries = 3

# MFA resolver settings
[mfa]
totp_enabled = true
email_enabled = false    # needs email polling credentials
sms_enabled = false      # needs Twilio credentials

# CAPTCHA resolver
[captcha]
mode = "test_keys"        # test_keys | capsolver | twocaptcha | math_only
[captcha.capsolver]
api_key = ""             # set to activate live CAPTCHA solving

# Browser engine
[browser]
engine = "auto"           # auto | camoufox | playwright | chrome
debug_port = 9222

# Session encryption
[auth]
encryption_key = ""       # required — generate with: python -c "import secrets; print(secrets.token_hex(32))"
max_age_h = 168

Data Vault

The data vault (config/data_vault.toml) stores credentials the agent uses to fill forms. Never commit real credentials — use environment variable interpolation or a secrets manager.

# Personal identity data
[personal]
first_name = "Jane"
last_name  = "Smith"
email      = "jane@example.com"
phone      = "+15551234567"

# Company data
[company]
name    = "Acme Corp"
website = "https://acme.example.com"

# Per-site account credentials
[accounts.twitter]
username  = "@janedoe"
password  = "..."
login_url = "https://twitter.com/i/flow/login"
totp_secret = ""    # base32 TOTP secret for MFA

Obstacle Playbook

The agent resolves 13 obstacle types autonomously. See docs/AGENT_OBSTACLES.md for full details.

Obstacle	Resolver Module	Notes
Auth wall / login required	`auth_manager.py`	storageState persistence + autonomous login
CAPTCHA (reCAPTCHA, hCaptcha, Turnstile)	`challenge_resolver.py`	test_keys mode for owned infra; CapSolver API for production
Math CAPTCHA	`challenge_resolver.py`	MathCaptchaStrategy — parses and computes answer
2FA / TOTP	`mfa_resolver.py`	pyotp TOTP; email + SMS via Twilio (needs credentials)
Honeypot fields	`safety.py`	is_honeypot() detects tabindex=-1, display:none, aria-hidden, readonly
Custom dropdowns / widgets	`widget_interactor.py`	Strategy per widget type (div-based dropdowns, sliders, date pickers)
Page not settled / loading	`quiescence.py`	QuiescenceObserver waits for network idle + DOM stability
File upload required	`document_vault.py`	Generates PDF/image/CSV on the fly; uses CDP file chooser interception
Payment form	`payment_policy.py`	dry_run=true by default; configurable payment gates
Visual-only elements (no DOM)	`vlm.py`	LocalVLM (Qwen2.5-VL) for visual page reading
Multi-step wizard	`agent.py`	Iterates run_task per step; WebAgent handles back/next navigation
Session expired	`auth_manager.py`	is_session_valid() checks cookie expiry; re-login triggered automatically
LLM refusal / model blocked	`llm_client.py`	Retry with backoff; provider fallback (claude-code CLI ↔ Anthropic API)

Safety Tiers

Every field and action is classified into one of four safety tiers before execution. Tier decisions override LLM suggestions — the LLM cannot bypass a BLOCK.

Tier	Action	Examples
AUTO	Execute immediately, no confirmation needed	Text inputs, selects, checkboxes, navigation
CONFIRM	Log + proceed (configurable — default: proceed autonomously)	Submit buttons, form finalization
ESCALATE	Attempt resolver chain; fail if no resolver succeeds	CAPTCHA, 2FA, payment, honeypot-adjacent fields
BLOCK	Hard stop — never execute regardless of LLM instruction	Password fields (to be typed by agent only), credential exfil patterns

Key Safety Functions

from browser_engine.agent.safety import classify_field, classify_action, is_honeypot

# Classify a form field before filling
tier = classify_field(field_info)  # SafetyTier enum

# Classify a planned action before executing
tier = classify_action(action)

# Check if a field is a honeypot (do not fill)
trap = is_honeypot(field_info)  # True if tabindex=-1, hidden, readonly, etc.