团队共享的 webqa-agent 项目指南。
重要提示:本文件是团队共享的,所有开发者都应遵循这些规范。个人偏好和本地配置应放在 CLAUDE.local.md(不会被 git 跟踪)。
本项目遵循模块化规则组织:
- Python 代码质量:
.claude/rules/python-quality.md - 测试模式:
.claude/rules/testing.md - Git 工作流:
.claude/rules/git-workflow.md - 浏览器测试:
.claude/rules/domain-specific/browser-testing.md
这些规则继承自全局配置(~/.claude/CLAUDE.md 和 ~/.claude/rules/),并增加了 webqa-agent 特定的要求。
对于复杂任务或涉及第三方依赖,必须先做调研:
- 工具:
mcp__plugin_context7_context7__resolve-library-id和query-docs - 获取最新的库文档和最佳实践
- 了解正确的 API 使用方式
- 避免基于过时知识的幻觉
- 工具:
WebSearch - 查找最新的技术文章和实践
- 了解常见问题和解决方案
- 验证技术决策的合理性
- ✅ 集成新的第三方库(Playwright, LangChain, FastAPI 等)
- ✅ 实现复杂的异步模式或并发控制
- ✅ 使用不熟悉的 Python 特性或设计模式
- ✅ 配置复杂的工具链(Docker, CI/CD 等)
- ✅ 实现安全相关功能(认证、加密等)
a. 识别任务中的三方依赖或复杂技术点
b. 使用 Context7 获取官方文档
c. 使用 WebSearch 查找最佳实践和常见陷阱
d. 基于调研结果制定实现计划
e. 开始编码实现
❌ 不要做的事情:
- 不要想当然地认为知道某个库的用法
- 不要跳过调研直接开始规划和写代码
- 不要基于旧版本的知识进行实现
- 类型注解:严格模式 - 所有函数必须有类型注解
- 注释密度:中等 - 复杂逻辑需要注释,简单逻辑不需要
- 错误处理:显式 - 明确的 try-except 和错误处理
- 日志级别:生产环境 info,调试时 debug
- ❌ 不要创建功能重复的类、方法或变量
- ❌ 不要复制粘贴代码,使用函数/方法复用
- ❌ 不要保留已废弃的代码(直接删除,Git 有历史记录)
- ✅ 重构时整合相似功能
- ✅ 使用继承和组合减少重复
- 类名:PascalCase
- 函数/方法:snake_case
- 常量:UPPER_SNAKE_CASE
- 私有成员:_leading_underscore
- 避免模糊命名:
temp,tmp,data等
详细规范见:.claude/rules/python-quality.md
避免非必要的 breaking changes 和破坏性修改:
- ❌ 不要随意修改公共 API 的签名
- ❌ 不要改变已有功能的行为方式
- ❌ 不要删除正在使用的代码而不考虑影响
- ✅ 新功能应该向后兼容
- ✅ 废弃功能使用 deprecation 警告而不是直接删除
- ✅ API 变更需要提供迁移指南
- 零学习成本优先:新功能应该符合现有模式
- 可选而非强制:新特性默认关闭或可选启用
- 渐进式增强:不强迫用户立即升级或改动
- 文档完善:新功能需要清晰的使用文档
- 是否影响现有的公共 API?
- 是否改变了用户可见的行为?
- 是否需要用户修改配置文件?
- 是否需要更新文档和示例?
- 是否测试了向后兼容性?
- 上下游依赖是否需要适配?
- 配置兼容:旧配置仍然可用,新配置为可选
- API 兼容:保留旧 API,新 API 为增强版
- 数据兼容:支持旧数据格式,自动迁移到新格式
- 行为兼容:默认行为不变,通过选项启用新行为
- ✅ Major 版本升级(如 v0.2.x → v0.3.0)
- ✅ 修复严重的安全漏洞
- ✅ 修复导致数据损坏的 bug
- ✅ 在充分沟通后废弃长期标记为 deprecated 的功能
所有代码提交前必须通过以下检查:
- 类型注解:所有函数都有类型提示
- 错误处理:适当的 try-except 和日志记录
- 测试:编写/更新测试并通过
- 文档:更新代码注释和 markdown 文档
- 清理:无调试 print 语句或注释代码
- 去冗余:无重复的类、方法或变量
- 兼容性:保持向后兼容
- 配置:配置变更是可选的且有文档
- Pre-commit:通过所有 pre-commit hooks
WebQA Agent 是一个自主式 Web 浏览器代理,用于全面的网站测试(功能、性能、UX、安全)。使用 OpenAI/Anthropic/Gemini 模型和浏览器自动化提供 AI 驱动的测试。
理念:自主探索和测试 - 无需手动脚本。适合快速迭代和 vibe-coding 工作流。
核心能力:
- AI 驱动的自主测试(无需手动脚本)
- 多提供商 LLM 支持(OpenAI, Anthropic, Gemini)
- 可扩展工具系统 - 通过 WebQABaseTool 添加自定义工具
- 全面的测试模式(功能、UX、性能、安全)
版本: v0.2.x 系列(当前分支:dev_0.2.4,已发布:v0.2.3)
StateRestorer(v0.2.4 移除):
- 原功能:自动恢复 replanned case 的 URL 状态
- 移除原因:将与 run 模式的 snapshot 功能统一规划
- 当前行为:replanned cases 从 homepage 开始,需通过
preamble_actions手动恢复状态 - 保留字段:
_is_replanned,_replan_source,preamble_actions(供后续使用)
# Testing
uv run pytest tests/ # Run all tests
uv run pytest tests/test_action_executor.py -v # Run specific test
# Running WebQA Agent
webqa-agent init # Generate config.yaml template
webqa-agent gen # Generate test cases (AI mode)
webqa-agent run # Run tests (auto-discovers config)
# Browser Setup
uv run playwright install chromium # Install browser
# Code Quality (use pre-commit, not individual tools)
pre-commit run --files <files> # Check/fix specific files
pre-commit run --all-files # Check/fix all files- CLI Entry:
webqa_agent/cli.py:main()- Command-line interface - Configuration Models:
webqa_agent/config_models/- Pydantic V2 config classes (GenConfig, RunConfig) - Browser Session Pool:
webqa_agent/browser/session.py:BrowserSessionPool- Browser lifecycle - LLM API:
webqa_agent/llm/llm_api.py:LLMAPI- Multi-provider LLM client - Action Handler:
webqa_agent/actions/action_handler.py:ActionHandler- Browser actions - UI Driver:
webqa_agent/tools/core/ui_driver.py:UITester- AI-powered UI testing - LangGraph Workflow:
webqa_agent/executor/gen/graph.py- AI workflow orchestration (Gen mode) - Executors:
webqa_agent/executor/- GenExecutor and RunExecutor for dual-mode execution - Tools Registry:
webqa_agent/tools/registry.py- Custom tools and default tools - Prompts:
webqa_agent/prompts/- Prompt templates for test planning and execution - Configuration File:
config/config.yaml- Main configuration file
Pydantic V2 Configuration Models (webqa_agent/config_models/):
-
Base Configs (
base_config.py)BrowserConfig- Browser settings (unified cookies management)ReportConfig- Report generation settingsLLMConfig- LLM provider settings with Extended Thinking support
-
Mode-Specific Configs
GenConfig(gen_config.py) - AI-driven test generation configurationRunConfig(run_config.py) - YAML case execution configuration
-
Key Features
- Field validators with
@field_validator+@classmethod .model_dump()for serialization (Pydantic V2)- Provider auto-detection (Claude/OpenAI/Gemini)
- Extended Thinking validation (temperature=1.0, max_tokens>budget_tokens)
- Field validators with
Configuration Flow:
config.yaml → CLI → GenConfig/RunConfig → Executor → LangGraph/CaseExecutor → Tools
-
Browser Session Pool (
webqa_agent/browser/session.py)- Pool-based concurrency with
acquire()/release()semantics - Automatic session recovery on failure
- Token-gated session creation (only pool can create sessions)
- Pool-based concurrency with
-
LLM Integration (
webqa_agent/llm/llm_api.py)- Auto-detection:
claude-*→ Anthropic,gemini-*→ Gemini,gpt-*→ OpenAI - Environment variables:
OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY - Provider-specific defaults (OpenAI temp=0.1, Anthropic/Gemini temp=1.0)
- Auto-detection:
-
Test Execution (
webqa_agent/executor/)GenExecutor- Gen mode orchestration (AI-driven test generation)RunExecutor- Run mode orchestration (YAML case execution)CaseExecutor- Individual case execution with parallel support- Session pool integration for resource management
-
Executor/Gen (
webqa_agent/executor/gen/)- LangGraph-based AI agent workflows for Gen mode
- Modular architecture:
agents/- Execution agents (execute_agent.py)state/- State schemas and managementutils/- Case recorder and message converter
-
Tools System (
webqa_agent/tools/)- Default tools (always enabled): action_tool.py, ux_tool.py, verify_tool.py
- Custom tools (optional): lighthouse_tool.py, nuclei_tool.py, button_check_tool.py, link_check_tool.py
- Core implementations (tools/core/): ui_driver.py, web_checks.py, lighthouse.py
- Registry (registry.py): Dependency checking and tool filtering
- Base class (base.py): WebQABaseTool for extensibility
-
Prompts (
webqa_agent/prompts/)test_planning_prompts.py- Test case planning and reflectionagent_execution_prompts.py- Agent execution guidanceui_automation_prompts.py- UI automation and verification
Single-Tab Architecture (AI Mode):
- All testing in single browser tab - multi-tab not supported
- Test modes:
- AI Mode (UI Agent, UX Test): Strict single-tab with layered coordination architecture
- Default Mode (Basic Test): Multi-tab allowed
- Layered Coordination Architecture (prevents 95%+ of new tabs, zero conflicts):
- Layer 0 (Base): session.py - Context-level DOM preprocessing and event listening via
add_init_script() - Layer 1 (Enhancement): action_handler.py - Click-level enhancements (history recording, periodic checks, form handling)
- Layer 2 (Monitoring): click_handler.py - Test execution monitoring and result tracking
- Layer 0 (Base): session.py - Context-level DOM preprocessing and event listening via
- Coordination Mechanism: Global flags prevent redundancy; session.py takes priority, action_handler.py enhances
- Features: No memory leaks, no conflicts, preserves all validated functionality
- Navigation: Use
GoBack(browser history) andGoToPage(direct URL) - Test pattern: Click → Verify → GoBack
Browser Session Management:
- Migration:
Driver.getInstance()→pool.acquire(),driver.page→session.page - No singleton pattern - sessions are pool-managed
- Per-session locking prevents race conditions
GoBack - Navigate to previous page
- Returns
Trueif succeeded,Falseif no history exists
GoToPage - Navigate to specific URL
- Returns
Trueif navigation succeeded
Standard Links - Click links normally
- All clicks navigate current tab (even if
target="_blank")
tests/conftest.py- Shared fixtures (supports--urloverride)tests/mocks/- JSON mock data for unit/integration teststests/test_pages/- Local HTML pages for isolated testing
uv run pytest tests/ -v -l # Verbose with local vars
uv run pytest tests/ --cov=webqa_agent # With coverage
uv run pytest tests/ -s # Show print statements
uv run pytest tests/test_crawler.py --url https://example.comOpenAI:
llm_config:
model: gpt-4.1-2025-04-14
filter_model: gpt-4o-mini
api_key: ${OPENAI_API_KEY}
temperature: 0.1Anthropic Claude:
llm_config:
model: claude-sonnet-4-5-20250929
filter_model: claude-haiku-4-5-20251001
api_key: ${ANTHROPIC_API_KEY}
temperature: 1.0 # Default for Claude; REQUIRED when using Extended Thinking
max_tokens: 20000 # Must be larger than budget_tokens
reasoning:
effort: medium # Enables Extended Thinking (budget_tokens=10000)Extended Thinking Requirements:
- temperature = 1.0 (Required, auto-enforced)
- max_tokens > budget_tokens (Required, auto-validated)
Recommended Configuration Table:
| effort | budget_tokens | recommended max_tokens | use case |
|---|---|---|---|
| minimal | 1,024 | 2,000 - 3,000 | Quick tasks |
| low | 4,096 | 8,000 - 10,000 | Basic reasoning |
| medium | 10,000 | 20,000 - 25,000 | Recommended for testing |
| high | 20,000 | 40,000 - 50,000 | Complex analysis |
Note: The system automatically adjusts budget_tokens if it exceeds max_tokens, but proper configuration yields better results. Temperature is automatically enforced to 1.0 when Extended Thinking is enabled.
Google Gemini:
llm_config:
model: gemini-3-flash-preview
filter_model: gemini-2.5-flash-lite
api_key: ${GEMINI_API_KEY}
temperature: 1.0Gen Mode (AI-driven testing):
test_config:
business_objectives: "test search functionality"
custom_tools:
enabled: ['lighthouse', 'nuclei'] # Optional custom tools: lighthouse, nuclei, traverse_clickable_elements, detect_dynamic_links
dynamic_step_generation:
enabled: true
max_dynamic_steps: 5
min_elements_threshold: 2Browser Config:
browser_config:
viewport: {width: 1280, height: 720}
headless: false # Auto true in Docker
language: en-US
save_screenshots: falseTool Response Tags:
[SUCCESS]- Action completed successfully[FAILURE:root_cause]- Recoverable failure[CRITICAL_ERROR:root_cause]- Unrecoverable, must abort[WARNING]- Non-blocking issue[CANNOT_VERIFY]- Assertion prerequisite failed
Failure Categories:
- ELEMENT_NOT_FOUND - Element missing/inaccessible
- NAVIGATION_FAILED - Page navigation failures
- PERMISSION_DENIED - Access denied
- PAGE_CRASHED - Browser crash
- NETWORK_ERROR - Network issues
- SESSION_EXPIRED - Authentication expired
- UNSUPPORTED_PAGE - PDF/plugin pages
- VALIDATION_ERROR - Form validation failures
When dynamic_step_generation.enabled = true:
- Two-layer recovery for ELEMENT_NOT_FOUND (retry + LLM replanning)
- LLM-driven recovery for all failure types (GoBack, timeout, permission, etc.)
- Loop detection: Aborts if same error pattern repeats (2+ times)
- Strategies: retry_modified, skip, abort
- JavaScript dialogs: Auto-accepted (
alert(),confirm(),prompt()) - Critical errors: Auto-abort to save resources
- Browser state: Detection flags for navigation actions (GoBack, GoToPage)
uv sync # Install dependencies
uv run playwright install chromium # Install browser
webqa-agent run # Run tests./start.sh --build # Build and start
./start.sh --local # Start existing image
docker-compose down # Stop servicespre-commit install # Install pre-commit hooks
pre-commit run --all-files # Run all hooks📚 User-facing documentation in /docs:
- CUSTOM_TOOL_DEVELOPMENT.md - Building custom tools for agent extensibility
- CUSTOM_TOOL_DEVELOPMENT_AI.md - AI-enhanced custom tool development
- MODES&CLI.md - Complete CLI reference and test modes
Chinese versions also available: CUSTOM_TOOL_DEVELOPMENT_zh-CN.md, MODES&CLI_zh-CN.md
📝 Claude's working documents in /claude_docs:
All Claude-generated documentation is organized in /claude_docs:
- Top level: General reference docs (ARCHITECTURE.md, CONFIGURATION.md, DEVELOPMENT.md, TROUBLESHOOTING.md)
- sessions/: Session-specific work documents using format
YYYY-MM-DD_task-description
See claude_docs/README.md for structure details and naming conventions.
For future Claude sessions: All new documentation should follow the session-based organization in claude_docs/sessions/YYYY-MM-DD_task-description/.
reports/- Generated HTML test reports (root level)logs/- Application logs and traces (root level)webqa_agent/logs/- Application logs (package level)webqa_agent/reports/- Test reports (package level)tests/actions_test_results/- Test execution outputstests/actions_test_results/screenshots/- Action test screenshotstests/crawler_test_results/screenshots/- Crawler test screenshots
Playwright not installed:
uv run playwright install chromiumAPI key issues:
export OPENAI_API_KEY="your-key"
# or
export ANTHROPIC_API_KEY="your-key"
# or
export GEMINI_API_KEY="your-key"Config not found:
webqa-agent init # Generate template
webqa-agent run -c /path/to/config.yaml # Specify pathEnable debug logging:
log:
level: debugSee TROUBLESHOOTING.md for complete guide.