Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 34 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<p align="center">
<b>Open-source framework and CLI for continuous evaluation, safety testing, and release readiness of Microsoft Foundry agents.</b>
<br/>
Can we ship it, and where is the proof?
Can we ship it, and how do we know?
</p>

<p align="center">
Expand All @@ -21,34 +21,36 @@ Can we ship it, and where is the proof?

## Overview

**AgentOps Accelerator is an open-source framework and CLI that standardizes
continuous evaluation, safety testing, and release readiness for enterprise AI
agents — with Microsoft Foundry as the agent runtime.**
AgentOps Accelerator is an open-source framework and CLI that standardizes
continuous evaluation, safety testing, and release readiness for enterprise
AI agents running on Microsoft Foundry.

It is an *orchestrator*, not a reimplementation. AgentOps wires together the
tools you already use — Foundry Evaluations, `azd ai agent eval`, the
It is an orchestrator, not a reimplementation. Foundry already builds and
runs the agent. Tools like Foundry Evaluations, `azd ai agent eval`, the
open-source ASSERT framework, the PyRIT-backed AI Red Teaming agent, Azure
Monitor / Application Insights, and your CI/CD platform — into a single
repeatable release loop:

1. **Evaluate** the agent against datasets, rubrics, and policies — locally or
in the cloud — using auto-selected evaluators for RAG, tool use, model
quality, and safety.
2. **Probe** the agent with adversarial inputs by orchestrating ASSERT
(`agentops assert run`) and the Foundry/PyRIT Red Teaming agent
(`agentops redteam run`) as active CI steps.
3. **Diagnose** repo, telemetry, landing zone, and Foundry readiness with
`agentops doctor`.
4. **Gate** the release with a deterministic exit-code contract that PRs and
pipelines can rely on.
5. **Prove** the release with a stable evidence pack (`evidence.json` +
`evidence.md`) that bundles eval results, ASSERT verdicts, red-team
findings, telemetry readiness, and Doctor findings for promotion review.
6. **Learn from production** by promoting reviewed traces into regression
datasets that feed the next eval cycle.

The output is a clear answer to two questions reviewers actually ask:
**can we ship it, and where is the proof?**
Monitor and Application Insights, and whatever CI/CD platform your team
prefers all exist and do their job well. What was missing was the glue that
pulls them into one repeatable release loop. That is what AgentOps provides.

The loop looks the same for every team and every agent. You evaluate the
agent against your datasets, rubrics, and policies, either locally or in the
cloud, with evaluators that AgentOps auto-selects based on whether the
scenario is RAG, tool use, model quality, or safety. You probe the agent
with adversarial inputs by running ASSERT through `agentops assert run` and
the Foundry/PyRIT red teaming agent through `agentops redteam run`, both as
active CI steps that gate the pipeline. You diagnose the rest of the
picture (repo layout, telemetry wiring, landing zone, and Foundry
configuration) with `agentops doctor`. The pipeline gates the release using
a deterministic exit-code contract that pull requests and CI/CD workflows
can rely on, and packages everything into a stable evidence pack
(`evidence.json` and `evidence.md`) that bundles eval results, ASSERT
verdicts, red-team findings, telemetry readiness, and Doctor findings for
whoever signs off on production. Once the release ships, AgentOps closes
the loop by promoting reviewed production traces back into regression
datasets that feed the next eval cycle.

The output is a clear answer to the two questions reviewers actually ask:
can we ship it, and how do we know?

### Core outputs

Expand All @@ -63,10 +65,11 @@ The output is a clear answer to two questions reviewers actually ask:

### Exit-code contract

- `0` — execution succeeded and all gates passed
- `2` — execution succeeded but a threshold, ASSERT violation, red-team rate,
or Doctor severity gate failed
- `1` — runtime or configuration error
AgentOps commands exit with `0` when execution succeeded and every gate
passed, with `2` when execution itself succeeded but a threshold, an ASSERT
violation, a red-team attack-success rate, or a Doctor severity gate
failed, and with `1` for runtime or configuration errors. Pipelines can
rely on this contract without parsing output.

## AgentOps and Microsoft Foundry

Expand Down
Loading