NOWAI-Bench

NOW as Enterprise AI Industry Gold Standard

NOWAI-Bench is ServiceNow's enterprise AI benchmarking suite, designed to measure whether AI agents perform reliably across the real workflows, domains, and governance demands of the world's largest organizations. Grounded in production-grade enterprise tasks across ITSM, HR, CSM, and cross-domain scenarios, it provides a rigorous, open standard that enables enterprises to make informed model selection decisions, validate AI deployments with confidence, and meet emerging regulatory requirements for AI transparency and accountability.

Benchmark Suite

Benchmark	Description	Repo
EnterpriseOps-Gym	1,150 tasks across core enterprise domains (IT, HR, Finance, Customer Service, Procurement). Submitted to ICML.	→ EnterpriseOps-Gym
EVA-Bench	Voice agent evaluation benchmark targeting enterprise contact center and service desk scenarios.	→ EVA-Bench

Getting Started

Each benchmark lives in its own repository with self-contained setup instructions. See the individual repos linked above.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOWAI-Bench

Benchmark Suite

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

NOWAI-Bench

Benchmark Suite

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages