[Enhancement] Add Web Agents Evaluation Workflow by NiraliPopat · Pull Request #130 · ServiceNow/SyGra

NiraliPopat · 2026-03-05T08:51:15Z

Summary

Added a comprehensive web agents evaluation framework to SyGra for evaluating browser automation agents with retry logic, inline evaluation, and advanced metrics tracking.

Explain the features implemented:

🎯 Core Evaluation Framework

Task Executor (task_executor.py): Complete evaluation workflow with pre/post processors
- RequestResponseLogger: Comprehensive request/response logging system with timestamped JSON logs
- FetchNextActionPreProcessor: Pre-processes LLM requests with chat history management and retry hint injection
- FetchNextActionPostProcessor: Post-processes LLM responses and extracts tool calls
- InlineEvaluationLambda: Real-time evaluation of model predictions against golden responses
- RetryFlow: Intelligent retry logic with failure detection
- ShouldContinueCondition: Workflow continuation control
- Flatten: Post-processor to flatten nested retry structures for analysis

🛠️ Browser Interaction Tools

10 browser automation tools (tools.py) using LangChain's @tool decorator:
- screenshot_tool, click_tool, type_tool, typing_tool, scroll_tool
- wait_tool, resume_tool, hil_tool, text_clear_tool, slider_tool

🔄 Intelligent Retry System

Adaptive failure hints that guide the agent:
- Tool Incorrect: Instructs agent to use different tool
- Parameters Incorrect: Instructs agent to fix parameters for same tool
Configurable retry limits (default: 3 retries)
Server error detection to prevent unnecessary retries

📊 Evaluation Metrics

Unit Metrics (Inline):
- Tool Match: Exact tool name matching
- Step Match: Comprehensive validation (tool + parameters + bounding box + direction + text)
Aggregator Metrics (Post-processing):
- Accuracy: Overall success rate
- Pass@k: Probability of success in k attempts
- Pass^k: Probability of success in all k attempts
- Step Efficiency: Efficiency scoring based on retry attempts with configurable penalty

⚙️ Configuration & Constants

Graph Configuration (graph_config.yaml): Complete workflow definition with system prompts, node configuration, and metrics setup
Constants (constants.py): Centralized configuration for server errors, tool mappings, retry settings, state keys, and failure hints

📚 Documentation

Comprehensive README: Added documentation covering architecture, components, workflow, usage, configuration, logging, error handling, best practices, and troubleshooting
Docs integration: Added to docs/eval/agents/web_agents_eval.md

Performance impact (if any):

N/A - New feature added

How to Test the feature

Steps for reviewers to verify functionality:

Run the task tasks/eval/agents/web_agents
Observe the Flatten_.json file for records level output
Observe the MetricCollatorPostProcessor_.json for overall metrics

Checklist

Lint fixes and unit testing done
End to end task testing
Documentation updated

tasks/eval/agents/web_agents/task_executor.py

…eption'' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

… into scratch/web_agents_eval

tasks/eval/agents/web_agents/task_executor.py

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Addition of Web Agents Evaluation

69583d4

NiraliPopat requested a review from a team as a code owner March 5, 2026 08:51

Merge branch 'main' into scratch/web_agents_eval

841f08e

github-code-quality bot found potential problems Mar 5, 2026

View reviewed changes

tasks/eval/agents/web_agents/task_executor.py Fixed Show fixed Hide fixed

tasks/eval/agents/web_agents/task_executor.py Fixed Show fixed Hide fixed

NiraliPopat and others added 5 commits March 5, 2026 14:25

Potential fix for pull request finding 'Except block handles 'BaseExc…

5df748d

…eption'' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

unit tests update

9291181

Merge branch 'scratch/web_agents_eval' of github.com:ServiceNow/GraSP…

911524f

… into scratch/web_agents_eval

unit tests update

f6cbf0d

code refactor

ae86859

github-code-quality bot found potential problems Mar 5, 2026

View reviewed changes

tasks/eval/agents/web_agents/task_executor.py Fixed Show fixed Hide fixed

Potential fix for pull request finding 'Unused local variable'

7c1d5ae

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

NiraliPopat changed the title ~~Addition of Web Agents Evaluation~~ [Enhancement] Add Web Agents Evaluation Workflow Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add Web Agents Evaluation Workflow#130

[Enhancement] Add Web Agents Evaluation Workflow#130
NiraliPopat wants to merge 8 commits intomainfrom
scratch/web_agents_eval

NiraliPopat commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NiraliPopat commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Explain the features implemented:

🎯 Core Evaluation Framework

🛠️ Browser Interaction Tools

🔄 Intelligent Retry System

📊 Evaluation Metrics

⚙️ Configuration & Constants

📚 Documentation

Performance impact (if any):

How to Test the feature

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NiraliPopat commented Mar 5, 2026 •

edited

Loading