Agent Documentation
Complete reference for the browser_engine.agent autonomous task execution engine.
Quick Start
The agent runs tasks autonomously from the command line. No human interaction required after launch.
Bridge API
Agent Modules
All 18 modules live in src/browser_engine/agent/. Each is independently testable with mocked dependencies.
| Module | Class | Purpose | Status |
|---|---|---|---|
| agent.py | WebAgent |
Core runner — two-phase planner, zero-human autonomy, pre-auth detection | Active |
| planner.py | Planner |
Phase A: LLM field→vault mapping. Phase B: deterministic action generation | Active |
| safety.py | — | classify_field / classify_action / detect_page_hazards / is_honeypot | Active |
| page_reader.py | — | JS-based DOM extraction into PageSnapshot |
Active |
| data_vault.py | DataVault |
TOML-based credential vault — personal, company, account sections | Active |
| mfa_resolver.py | MFAResolver |
TOTP via pyotp; email polling + SMS via Twilio (needs credentials) | Built / needs creds |
| auth_manager.py | AuthManager |
storageState persistence + autonomous login flow | Active |
| challenge_resolver.py | ChallengeResolver |
CAPTCHA solving — test keys, math CAPTCHAs, CapSolver API | Active (test_keys mode) |
| quiescence.py | QuiescenceObserver |
Idle/settled detection — waits for page to stabilize before acting | Active |
| vlm.py | LocalVLM |
Qwen2.5-VL visual page reading (needs local model endpoint) | Built / needs endpoint |
| widget_interactor.py | WidgetInteractor |
Custom widget strategies — dropdowns, sliders, date pickers | Active |
| trace_cache.py | TraceCache |
Action trace caching for replay and debugging | Active |
| document_vault.py | DocumentVault |
PDF/image/CSV file generation for upload tasks | Active |
| payment_policy.py | PaymentPolicyEngine |
Payment handling policy (dry_run=true by default) | Built / dry_run only |
| encoder_mapper.py | EncoderMapper |
Field encoding/decoding for vault key mapping | Active |
| llm_client.py | LLMClient |
LLM backend (Anthropic API / Claude Code CLI) with retry + backoff | Active |
| models.py | — | All dataclasses: WebTask, PageSnapshot, FieldInfo, StepAction, SafetyTier… | Active |
| __main__.py | — | CLI entry point: python -m browser_engine.agent |
Active |
Configuration
All agent settings live in config/agent.toml. Secrets (credentials, API keys) go in config/data_vault.toml.
Key Sections in agent.toml
Data Vault
The data vault (config/data_vault.toml) stores credentials the agent uses to fill forms. Never commit real credentials — use environment variable interpolation or a secrets manager.
Obstacle Playbook
The agent resolves 13 obstacle types autonomously. See docs/AGENT_OBSTACLES.md for full details.
| Obstacle | Resolver Module | Notes |
|---|---|---|
| Auth wall / login required | auth_manager.py |
storageState persistence + autonomous login |
| CAPTCHA (reCAPTCHA, hCaptcha, Turnstile) | challenge_resolver.py |
test_keys mode for owned infra; CapSolver API for production |
| Math CAPTCHA | challenge_resolver.py |
MathCaptchaStrategy — parses and computes answer |
| 2FA / TOTP | mfa_resolver.py |
pyotp TOTP; email + SMS via Twilio (needs credentials) |
| Honeypot fields | safety.py |
is_honeypot() detects tabindex=-1, display:none, aria-hidden, readonly |
| Custom dropdowns / widgets | widget_interactor.py |
Strategy per widget type (div-based dropdowns, sliders, date pickers) |
| Page not settled / loading | quiescence.py |
QuiescenceObserver waits for network idle + DOM stability |
| File upload required | document_vault.py |
Generates PDF/image/CSV on the fly; uses CDP file chooser interception |
| Payment form | payment_policy.py |
dry_run=true by default; configurable payment gates |
| Visual-only elements (no DOM) | vlm.py |
LocalVLM (Qwen2.5-VL) for visual page reading |
| Multi-step wizard | agent.py |
Iterates run_task per step; WebAgent handles back/next navigation |
| Session expired | auth_manager.py |
is_session_valid() checks cookie expiry; re-login triggered automatically |
| LLM refusal / model blocked | llm_client.py |
Retry with backoff; provider fallback (claude-code CLI ↔ Anthropic API) |
Safety Tiers
Every field and action is classified into one of four safety tiers before execution. Tier decisions override LLM suggestions — the LLM cannot bypass a BLOCK.
| Tier | Action | Examples |
|---|---|---|
| AUTO | Execute immediately, no confirmation needed | Text inputs, selects, checkboxes, navigation |
| CONFIRM | Log + proceed (configurable — default: proceed autonomously) | Submit buttons, form finalization |
| ESCALATE | Attempt resolver chain; fail if no resolver succeeds | CAPTCHA, 2FA, payment, honeypot-adjacent fields |
| BLOCK | Hard stop — never execute regardless of LLM instruction | Password fields (to be typed by agent only), credential exfil patterns |