diff --git a/README.md b/README.md index 495a2ff..ecef64b 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@

Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.
-Can we ship it, and where is the proof? +Can we ship it, and how do we know?

@@ -21,34 +21,36 @@ Can we ship it, and where is the proof? ## Overview -**AgentOps Accelerator is an open-source framework and CLI that standardizes -continuous evaluation, safety testing, and release readiness for enterprise AI -agents — with Microsoft Foundry as the agent runtime.** +AgentOps Accelerator is an open-source framework and CLI that standardizes +continuous evaluation, safety testing, and release readiness for enterprise +AI agents running on Microsoft Foundry. -It is an *orchestrator*, not a reimplementation. AgentOps wires together the -tools you already use — Foundry Evaluations, `azd ai agent eval`, the +It is an orchestrator, not a reimplementation. Foundry already builds and +runs the agent. Tools like Foundry Evaluations, `azd ai agent eval`, the open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure -Monitor / Application Insights, and your CI/CD platform — into a single -repeatable release loop: - -1. **Evaluate** the agent against datasets, rubrics, and policies — locally or - in the cloud — using auto-selected evaluators for RAG, tool use, model - quality, and safety. -2. **Probe** the agent with adversarial inputs by orchestrating ASSERT - (`agentops assert run`) and the Foundry/PyRIT Red Teaming agent - (`agentops redteam run`) as active CI steps. -3. **Diagnose** repo, telemetry, landing zone, and Foundry readiness with - `agentops doctor`. -4. **Gate** the release with a deterministic exit-code contract that PRs and - pipelines can rely on. -5. **Prove** the release with a stable evidence pack (`evidence.json` + - `evidence.md`) that bundles eval results, ASSERT verdicts, red-team - findings, telemetry readiness, and Doctor findings for promotion review. -6. **Learn from production** by promoting reviewed traces into regression - datasets that feed the next eval cycle. - -The output is a clear answer to two questions reviewers actually ask: -**can we ship it, and where is the proof?** +Monitor and Application Insights, and whatever CI/CD platform your team +prefers all exist and do their job well. What was missing was the glue that +pulls them into one repeatable release loop. That is what AgentOps provides. + +The loop looks the same for every team and every agent. You evaluate the +agent against your datasets, rubrics, and policies, either locally or in the +cloud, with evaluators that AgentOps auto-selects based on whether the +scenario is RAG, tool use, model quality, or safety. You probe the agent +with adversarial inputs by running ASSERT through `agentops assert run` and +the Foundry/PyRIT red teaming agent through `agentops redteam run`, both as +active CI steps that gate the pipeline. You diagnose the rest of the +picture (repo layout, telemetry wiring, landing zone, and Foundry +configuration) with `agentops doctor`. The pipeline gates the release using +a deterministic exit-code contract that pull requests and CI/CD workflows +can rely on, and packages everything into a stable evidence pack +(`evidence.json` and `evidence.md`) that bundles eval results, ASSERT +verdicts, red-team findings, telemetry readiness, and Doctor findings for +whoever signs off on production. Once the release ships, AgentOps closes +the loop by promoting reviewed production traces back into regression +datasets that feed the next eval cycle. + +The output is a clear answer to the two questions reviewers actually ask: +can we ship it, and how do we know? ### Core outputs @@ -63,10 +65,11 @@ The output is a clear answer to two questions reviewers actually ask: ### Exit-code contract -- `0` — execution succeeded and all gates passed -- `2` — execution succeeded but a threshold, ASSERT violation, red-team rate, - or Doctor severity gate failed -- `1` — runtime or configuration error +AgentOps commands exit with `0` when execution succeeded and every gate +passed, with `2` when execution itself succeeded but a threshold, an ASSERT +violation, a red-team attack-success rate, or a Doctor severity gate +failed, and with `1` for runtime or configuration errors. Pipelines can +rely on this contract without parsing output. ## AgentOps and Microsoft Foundry