Add live golden-path (Tier 2) pipeline for azd ai agent extension by v1212 · Pull Request #8758 · Azure/azure-dev

v1212 · 2026-06-22T11:39:16Z

Summary

Adds eng/pipelines/ext-azure-ai-agents-live.yml: an on-demand / weekly Azure DevOps pipeline that runs the Tier 2 live golden path (init → provision → deploy → invoke → down) for the azure.ai.agents extension against a real Azure (TME) subscription, for both code and container deploy modes.
Adds the Tier 2 live driver as a Go test (tier2_live_test.go, console_test.go, assert.go) under cli/azd/extensions/azure.ai.agents/tests/e2e-live/, driving azd ai agent through a pseudo-terminal with go-expect + vt10x + creack/pty (replacing the original tmux/Python prototype from test: add static E2E tests for azure.ai.agents extension #8692), with CI auth detection for Azure DevOps (TF_BUILD / E2E_USE_AZ_CLI_AUTH).
Adds a README.md documenting how to run (CI + local) and the one-time ADO setup.

Together with the PR-gate tests in #8754 (Tier 0 offline + Tier 1 recording/playback), this covers Tier 0/1/2 from the original prototype (#8692).

At a glance

What it tests — the full live golden path init → provision → deploy → invoke → down of azd ai agent, driving a real CLI over a pseudo-terminal (Go go-expect + vt10x + creack/pty), for both code (zip) and container (ACR build) deploy modes, run sequentially. The interactive init phase is an event-driven prompt loop (wait for the survey UI to settle, read the rendered screen, dispatch on the verbatim prompt strings) rather than a fixed script, so it tolerates the live model/region/project branches. The invoke step asserts a live model answers "what is 2+2?" with a standalone 4 / four.
How to trigger — trigger: none / pr: none, so it never runs on PR push. Run it via: a PR comment /azp run ext-azure-ai-agents-live (requires repo write), the weekly Mon 07:00 UTC schedule, or a manual ADO queue (pick deployModes = both / code / container).
Identity / subscription / what it does — authenticates as the azure-sdk-tests ADO service connection (Workload Identity Federation / OIDC) into the shared TME test subscription, region eastus2. It creates a fresh Foundry project + gpt-4o-mini deployment (+ an ACR in container mode) and the agent service, exercises them, then tears them down. az logs in via OIDC and azd reuses that session (auth.useAzCliAuth + keepAzSessionActive: true, so the WIF token survives the multi-minute run).
Leftover resources? — No, by design. Each mode registers a t.Cleanup that runs azd down --force --purge, and a teardown failure fails the run (never green while leaking). A condition: always() AzureCLI@2 step is a second safety net: it force-purges any project dirs left under /tmp/e2e-tests/tier2-*/ after a crash or timeout. (Crash-path cleanup is best-effort, so the TME subscription is worth an occasional spot-check.)
Runs on Azure DevOps, not GitHub Actions — the YAML lives in this GitHub repo (eng/pipelines/…) but is an ADO pipeline (extends: 1es-redirect, task: AzureCLI@2, $(Build.*), ##vso[...]), executed by the azure-sdk ADO project. GitHub CI for this PR stays green/inert; the live run only happens in ADO.
Post-merge admin action — needs an Azure DevOps admin on the azure-sdk project (Azure SDK EngSys) — not a GitHub admin — to: (a) register the pipeline as ext-azure-ai-agents-live (the exact name /azp run uses) against this repo + YAML path; (b) confirm the azure-sdk-tests service connection maps to the TME subscription with RBAC to create Foundry projects and deploy models (Contributor + Azure AI Developer + Cognitive Services Contributor, or equivalent); (c) kick the first /azp run to validate the keepAzSessionActive auth path. Merging the PR itself only needs a GitHub maintainer approval. The init GitHub token is already wired via the ambient azuresdk-github-pat org secret, so no extra secret setup is needed.

Testing

The golden-path flow was validated end-to-end in the test: add static E2E tests for azure.ai.agents extension #8692 prototype fork run (code ~669s, container ~711s). The driver has since been rewritten from tmux/Python to the Go PTY harness in this PR; the live run of the Go version is exercised post-merge via /azp run (the test is gated behind AZURE_AI_AGENTS_E2E_LIVE=1 and never runs in PR CI).
Local validation of the Go harness: gofmt, go vet, golangci-lint (ubuntu + windows), and a go test -c compile all clean on the linux build tag; the no-tag assert.go answer-matching logic is covered by assert_test.go (runs on the host); README.md passes cspell.
Driver design choices hardened across review: an event-driven prompt loop with output-settle detection (replacing a fixed poll) so live conditional prompts don't desync; loop detection that advances past or fails out of a stuck prompt; t.Context() as the run-context parent so go test -timeout cancellation propagates; LIFO t.Cleanup ordering so teardown runs before the copied AZD_CONFIG_DIR is removed; a destructive-RemoveAll guardrail on the test dir; and a robust standalone-token check for the invoke result (accepts 4 or four, ignores incidental 4s such as gpt-4o-mini / 4.1 / 404).
Timeout hierarchy: per-mode runTimeout (60 min, a context deadline) > the sum of the per-phase budgets (init/provision/deploy/invoke ≈ 32 min) so a slow-but-healthy run is never preempted mid-phase; both modes fit under go test -timeout 125m, and the ADO AzureCLI@2 step (130 min) and job (150 min) caps sit strictly above that, so a run can never be hard-killed mid-azd down (which would leak live resources).
The ADO pipeline itself only runs once registered in ADO (inert in GitHub CI by design).

Adds eng/pipelines/ext-azure-ai-agents-live.yml, an on-demand/weekly Azure DevOps pipeline that drives the real 'azd ai agent' CLI through tmux against live Azure (TME), exercising init -> provision -> deploy -> invoke -> down for both code and container deploy modes. This is the live counterpart to the PR-gate checks (Tier 0 offline + Tier 1 recording/playback in #8754). Per Azure SDK EngSys / SFI guidance, live access stays out of the automatic PR pipeline (trigger: none) and runs only via '/azp run ext-azure-ai-agents-live' or the weekly schedule. The Tier 2 tmux driver (test_full_e2e.py, test_tier2.py) is migrated from the #8692 prototype; CI auth detection is extended to recognize Azure DevOps (TF_BUILD) and an explicit E2E_USE_AZ_CLI_AUTH override.

…zSessionActive The azure-sdk-tests service connection uses Workload Identity Federation, whose az session is isolated to the task's private AZURE_CONFIG_DIR and expires after ~10 min. Running the ~50 min golden-path test (and the cleanup) as plain bash steps after a separate login step would fail auth on both counts. Run them inside AzureCLI@2 with keepAzSessionActive:true (matching build-cli.yml) so the session stays refreshed and reaches azd (auth.useAzCliAuth) through tmux, which inherits AZURE_CONFIG_DIR. Subscription/tenant are now read in-script via az account show instead of cross-step pipeline variables.

test_tier2.py always ran sequentially, but kept a tautological if-condition (len==1 or len>1), an unused concurrent.futures import, a no-op --serial flag, and a docstring/print claiming parallel execution. Simplify to an explicit sequential loop and update the docstring to match. Also fix test_full_e2e.py's module docstring to point at README.md (LOCAL-TEST-GUIDE.md does not exist).

Copilot

Pull request overview

This PR adds a dedicated Azure DevOps pipeline and supporting Python drivers/docs to run the Tier 2 live golden-path E2E for the azure.ai.agents azd extension (init → provision → deploy → invoke → down) against real Azure resources, outside of the GitHub PR gate.

Changes:

Added an on-demand + weekly ADO pipeline (ext-azure-ai-agents-live) to run Tier 2 live E2E for both code and container deploy modes.
Added a tmux-driven Python Tier 2 runner (test_tier2.py) and a full golden-path driver (test_full_e2e.py) adapted for ADO CI auth detection.
Added documentation for running the live Tier 2 tests locally and in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
eng/pipelines/ext-azure-ai-agents-live.yml	New ADO pipeline wiring to build azd + extension, run Tier 2 live E2E under AzureCLI@2, and publish logs/cleanup.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py	Tier 2 orchestrator: runs code + container golden paths sequentially with isolation and timeout handling.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py	tmux-driven end-to-end golden path driver (init/provision/deploy/invoke/down) with CI/local auth switching.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md	Docs for CI setup (ADO registration/service connection/secrets) and local WSL execution.

- Use the ambient azure-sdk org secret `azuresdk-github-pat` for GH_TOKEN instead of an empty `GitHubPat` placeholder variable (mirrors eval-waza.yml); removes a misleading masked variable and the need for admin PAT setup. - Harden the AzureCLI@2 inline script: `set -euo pipefail` and assign-then-verify subscription/tenant so an `az account show` failure fails fast (a plain `export X=$(...)` would have masked the error from set -e). - Reword the extension-install comment to be self-contained (it no longer inaccurately claims to mirror lint-ext-azure-ai-agents.yml). - Clarify the test_full_e2e.py auth prerequisite: only local WSL runs leave auth.useAzCliAuth unset; CI auto-enables az CLI auth. - Clear tmux scrollback after env setup so the exported GH token cannot leak into capture() output on failures/timeouts. - _cleanup_leaked_resources now checks azd down's return code and reports failures instead of always printing "Cleanup complete".

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

- Stream child E2E output live with a watchdog-enforced hard timeout instead of buffering everything via capture_output - Shell-escape the GitHub token (shlex.quote) before exporting in tmux - Clean up the per-mode AZD_CONFIG_DIR temp copy unless E2E_KEEP_ARTIFACTS - Use sha256 instead of md5 for the agent-name uniqueness suffix - Derive the agent binary arch from uname -m instead of hard-coding amd64

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

- Shell-escape HOME/PATH/TENANT, the cd target, and the agent name with shlex.quote() (consistent with the earlier token fix) - On Tier 2 timeout, kill the child's detached tmux server so reused CI agents do not accumulate orphaned tmux sockets

pyflakes flagged two f-strings with no interpolation; they are static text, so the f-prefix is dead. Plain string literals keep the linter (and Copilot) clean. No behavior change.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

The docstring and README claimed tmux >=3.4, but the live pipeline installs tmux via apt-get on ubuntu-22.04 (LINUXVMIMAGE), which ships 3.2a. The driver only uses basic verbs (send-keys -l, capture-pane -p, new-session -x/-y, clear-history, kill-session/server) available well before 3.2, so >=3.4 was inaccurate and misleading. Align both docs to >=3.2.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

jongio

Solid Tier 2 pipeline addition. The timeout hierarchy (phase budgets < child watchdog 60m < step 130m < job 150m) with explicit margin documentation is particularly well-thought-out, and the three-layer resource cleanup (in-test teardown, parent watchdog cleanup, pipeline condition: always() safety net) gives good confidence that leaked resources won't accumulate.

A few minor observations (none blocking):

File size (test_full_e2e.py, 925 lines): If more live test scenarios are planned (e.g., Tier 2 for other extensions), the tmux interaction helpers (lines ~79-170: tmux(), send(), capture(), wait_for(), run_cmd(), etc.) could be extracted into a shared utility module. For a single test driver this is fine.

Signal handling: The __main__ block doesn't wrap the phase chain in try/finally for tmux cleanup. If an unexpected exception occurs after setup() but before the tmux("kill-session") at the bottom, the tmux session/socket persists on the agent. The parent's watchdog and pipeline cleanup step cover this in practice, so it's not urgent.

Invoke regex: Nice work on (?<![\w.])4(?!\.\d)(?!\w) for the "2+2" assertion. Correctly rejects "gpt-4o-mini", "4.1", "404" while matching standalone "4". Clean.

vhvb1989 · 2026-06-24T06:13:10Z

Request: switch the Tier 2 interactive driver from tmux/Python to Go (go-expect + vt10x)

Thanks for this — the live golden-path coverage and the safety nets (watchdog, condition: always() purge, teardown-fails-the-run) are solid. One direction change I'd like before we standardize on this as the pattern for interactive extension tests: please drive the interactive flow with Go instead of tmux + Python.

Why

Language/toolchain consistency — the extension and azd are Go; a Go driver keeps tests in one toolchain with type-safe assertions, and avoids the external tmux >= 3.2 runtime dependency and the detached-tmux-server cleanup dance (kill-server, orphaned sockets on reused agents).
The libraries are already vendored — azd's dependency graph already contains the full Go-native expect stack (in cli/azd/go.sum):
- github.com/creack/pty — real PTY for the spawned azd binary
- github.com/Netflix/go-expect — expect-style send/expect API (the Go equivalent of Python pexpect)
- github.com/hinshun/vt10x — terminal emulator that renders ANSI/cursor sequences into a 2D screen (the equivalent of tmux capture-pane -p)

Parity — yes, it can do everything the tmux/Python driver does

I went through the current driver capability-by-capability:

tmux/Python	Go `go-expect` + `vt10x`
Real PTY running `azd`	`creack/pty` (attach `cmd.Stdin/out/err = console.Tty()`)
`send-keys -l` literal text	`Send` / `SendLine`
Enter / arrow keys	raw escape bytes (`"\r"`, `"\x1b[B"`, `"\x1b[A"`) wrapped as named-key helpers
`capture-pane -p` rendered screen	`vt10x.State.String()`
wait-for-substring / prompt loop	`ExpectString` / `ExpectRegexp` (blocks until match/timeout)
assert highlighted menu item	send key → read `vt10x.State` → assert rendered line
sequential commands in a session	attach each `azd` command to the console, or spawn a bash in the PTY + sentinels (same model)
live-stream + tail	`expect.WithStdout(os.Stdout, tailBuffer)`
watchdog / hard timeout / kill	`context` + per-`Expect` timeout + `cmd.Process.Kill()`
detached tmux server must be killed	N/A — PTY + child owned by the test process (a net simplification)

The one thing to validate is that vt10x renders the real azd ai agent init wizard identically to a real terminal (vt10x is an emulator; tmux uses a real terminal). It's a fairly complete xterm/VT100 emulator and this stack is purpose-built for testing exactly these CLIs, so risk is low — but worth a quick spike on the actual init flow before fully committing. Happy to drive/help with that spike.

Note: today only PHASE 1 (init) is interactive — provision / deploy / invoke / down already run --no-prompt. So the interactive surface to port is just init; the rest stay as plain subprocess calls.

The bigger plan (future iterations) — a shared driver for all extensions and azd

This isn't a one-off ask. We're planning a shared, public test-support module (tracked in #8795) that exposes this interactive driver so every Go extension — and azd core — defines interactive tests without hand-rolling a per-extension harness. Sketch of the extension-facing API:

sess := extframework.StartInteractive(t, "ai", "agent", "init") // PTY + vt10x + go-expect, auto-cleanup via t.Cleanup
sess.ExpectString("Select a subscription")
sess.Send(extframework.KeyDown, extframework.KeyEnter)
sess.ExpectString("Enter a name")
sess.SendLine("my-agent")
sess.ExpectString("added to your azd project successfully")
require.Contains(t, sess.Screen(), "Done")

out, code := extframework.Run(t, "provision", "--no-prompt")     // non-interactive phases reuse the plain runner

The same driver is intended to serve Tier 1 (recorded/playback) and Tier 2 (live), sharing CLI_TEST_AZD_PATH and the recording session with the non-interactive runner. See #8795 for the full plan, acceptance criteria, and the source-scoped replace mechanism that lets extensions pin it from source for test builds only.

Suggested path forward

For this PR: port the interactive init step to a Go go-expect+vt10x driver (kept local to the extension's tests/ for now is fine), and drop the tmux/Python files. The non-interactive phases can stay as Go os/exec calls.
As Support extensions hosting their own functional tests (public test-support module + source-scoped replace) #8795 lands, lift that local driver into the shared module so it's reusable across extensions and azd.

If the timing of #8795 is a concern, we can land the Go driver locally in the extension first and migrate it into the shared module afterward — but I'd prefer to avoid merging a tmux/Python driver that we know we'll replace. Let me know if you'd like me to prototype the go-expect+vt10x init driver to get you started.

Per review feedback, replace the tmux-driven Python golden-path driver with a Go test so the live suite uses the same toolchain as the rest of the extension and needs no Python/tmux on the agent. - TestTier2Live drives the interactive `azd ai agent init` through a pseudo-terminal (Netflix/go-expect sends keys, hinshun/vt10x renders the screen, creack/pty provides the PTY); provision/deploy/invoke/down shell out to azd with --no-prompt. Per-mode 60m context drives the graceful azd-down teardown via t.Cleanup. - The PTY driver is tagged //go:build linux; the answer matcher and its table unit tests build and run on any platform. - Pipeline builds and runs `go test -run TestTier2Live` and drops the Python/tmux setup steps; promote go-expect/vt10x/creack-pty to direct requires; refresh README and cspell. - Remove test_tier2.py and test_full_e2e.py.

…d source Replace the init driver's fixed 3s polling with go-expect event-driven reads that block until the survey UI settles, then dispatch on the verbatim prompt strings the extension prints (each case annotated with its source file:line). Select deploy mode via the --deploy-mode flag, which init auto-resolves rather than prompts when a manifest is supplied. Document the dispatch-loop design and the live-only verification path in the README. Addresses review feedback on #8758.

jongio

The Go rewrite (commits c62a696 and 4d27988) directly addresses @vhvb1989's direction request. The interactive driver now uses go-expect + vt10x + creack/pty with event-driven synchronization (waitForQuiet) instead of fixed-interval polling. This is solid work.

What looks good:

Event-driven sync: promptQuiet/listSettle waits for the UI to stop emitting before dispatching, rather than sleeping a fixed interval. Correct approach for flake-free interactive tests.
Timeout hierarchy: per-mode runTimeout (60m) < go test -timeout (125m) < ADO step (130m) < job (150m). Each layer has room for graceful teardown before the next one kills it.
Cleanup safety: assertSafeTestDir blocks dangerous os.RemoveAll on protected paths; t.Cleanup(r.teardown) guarantees azd down --force --purge runs even on failure; the condition: always() pipeline step is a crash-path backstop.
Platform-portable assert logic: assert.go builds on all platforms (no build tag), with the regex-free standalone-"4" detection explained clearly.
Pipeline arch detection: uname -m mapping to GOARCH means the step won't break if the pool moves off amd64.

One observation for the conversation with @vhvb1989:

The latest commits already deliver what was requested. The old Python/tmux files are gone, the Go driver is in place, and it uses exactly the libraries and approach described in the direction-change comment. The driver is local to the extension's tests/e2e-live/ for now (as suggested), ready to lift into the shared module when #8795 lands.

Minor notes (non-blocking):

tier2_live_test.go at 859 lines is large for a single file, but splitting a sequential test flow (init/provision/deploy/invoke/down + all helpers) across multiple files would hurt readability. Acceptable.
The dispatchPrompt source-line annotations (e.g., init.go:722) are useful for maintenance but will become stale as the extension evolves. The README already documents this trade-off ("changes there can require updating dispatchPrompt"). Consider a lint or generator if drift becomes a maintenance burden.
findServiceName uses line-by-line string matching to parse azure.yaml. This is pragmatic for the known scaffolded output format and avoids pulling in a YAML library for one test helper. Fine as-is.

v1212 · 2026-06-24T15:47:49Z

Thanks @vhvb1989 — done. Ported the interactive init driver to Go (go-expect + vt10x + creack/pty) and dropped the tmux/Python files; the --no-prompt phases stay as os/exec calls.

Per "one thing to validate" note: I did a source-grounded spike and calibrated the prompt dispatch to the real azd ai agent init strings (event-driven sync; deploy mode via --deploy-mode). The one thing I can't check locally — vt10x rendering the live wizard — gets exercised on the first ADO /azp run; I'll validate/debug from there.

v1212 · 2026-06-24T15:57:02Z

Thanks @jongio. The driver changed — the Go rewrite removed the original Python/tmux code, so those first-review points are gone. The three notes on the new code are acknowledged trade-offs (the dispatchPrompt drift one is documented in the README); I'll revisit a generator if that drift becomes painful. No code changes needed — ready for another look.

jongio

My prior analysis (two rounds against this same HEAD) still holds. The Go rewrite using go-expect + vt10x + creack/pty is solid: event-driven synchronization via waitForQuiet, clean timeout hierarchy, three-layer resource cleanup, and well-tested assert logic.

One code modernization note below. Also: the PR currently has merge conflicts that need resolving before merge.

No blocking issues from my side.

jongio

The Go rewrite from tmux+Python to a PTY-based Go driver is a significant improvement: same toolchain as the rest of the extension, no Python/tmux dependency on the agent, and the event-driven dispatch loop (wait for survey UI to settle, then match the rendered prompt) is more robust than fixed polling.

A few observations on the final state:

findServiceName naive YAML walk (tier2_live_test.go): The line-by-line parser assumes a specific indentation structure. Since azd ai agent init always scaffolds the same format this is fine today, but if the template output changes it will silently break. Consider a targeted grep -m1 approach or importing a minimal YAML decoder if the function gets reused.
Timeout cascading is well-designed: per-mode 60m > sum-of-phases 50m (10m teardown margin), step 130m > go-test 125m, job 150m > step. This ensures graceful teardown always runs before hard kills, preventing resource leaks. The commit that adjusted these (db53b06) explains the invariant clearly.
responseHasExpectedAnswer rejects "4.0" by design (the 4 followed by . then digit rule). Worth a brief code comment noting this is intentional, since a future maintainer might wonder why a valid math answer is rejected. The existing unit tests cover this implicitly via the "version" case ("4.1") but an explicit "4.0" case in the table would make the design intent obvious.
Dependency versions (go-expect, creack/pty, vt10x) are the same 2022 pseudo-versions that AlecAivazis/survey uses in its own test suite. They are proven to work well together for this exact use case. No action needed, but worth noting in case a future dep audit flags them as stale.

Per AGENTS.md Go 1.26 guidance, switch exitCode() from errors.As to errors.AsType[*exec.ExitError]. Add an explicit "4.0" case to the assert table and document why a decimal answer is deliberately rejected.

# Conflicts: # cli/azd/extensions/azure.ai.agents/cspell.yaml

jongio

One minor nit on the safety-net cleanup step: the 2>/dev/null on line 213 swallows all diagnostic output from azd down. If a crash-path cleanup fails, the pipeline log won't show why. Since the task already sets continueOnError: true (so a non-zero exit can't fail the build), consider 2>&1 || true instead to keep diagnostic output visible for post-mortem investigation while still preventing the set -e from aborting the loop.

Not blocking, the current pattern is defensible for a quiet safety net.

Use 2>&1 instead of 2>/dev/null in the always() cleanup loop so a failed force-purge is visible for debugging a resource leak; keep || true so set -e doesn't abort the loop.

jongio

The Go rewrite of the Tier 2 driver is well-structured: event-driven prompt dispatch, proper timeout hierarchy (per-mode 60m < go test 125m < step 130m < job 150m), and LIFO cleanup ordering so teardown always runs before the config dir it depends on is removed.

One minor observation below on findServiceName; everything else looks solid after 12+ rounds of iteration.

jongio

The crash-path cleanup now surfaces stderr via 2>&1 instead of swallowing it, making failed force-purge attempts visible in the pipeline log. The || true guard still prevents the loop from stopping on individual azd down failures.

findServiceName (tier2_live_test.go:649) parses azure.yaml by string matching rather than a YAML library. This works for the predictable output of azd ai agent init, but could silently break on edge-case YAML (a comment containing services:, or a multi-line string value). Low risk since the generated files are predictable, but worth noting if the test starts failing on valid projects.

Replace the indentation/colon line scan with a minimal struct unmarshal (gopkg.in/yaml.v3, already a direct dep) so the service-name lookup tolerates comments and indentation changes in scaffolded azure.yaml.

jongio

One suggestion on context propagation in the test runner.

jongio

One suggestion on context propagation in the test runner.

jongio

Reviewed the full diff (8 files, +1550/-1). My prior feedback from yesterday was addressed in the latest commits (errors.AsType, stderr visibility in cleanup, struct-based YAML parsing, t.Context() propagation).

Fresh pass on the current state: the test architecture is well-designed. The event-driven prompt loop (go-expect waitForQuiet instead of fixed-interval polling), the bounded ring buffer for diagnostics, the LIFO cleanup ordering, and the tiered timeout hierarchy (per-phase < per-mode < go-test < ADO step < ADO job) are all solid engineering.

One non-blocking note for future maintenance: the dispatchPrompt cases are tightly coupled to the extension's prompt strings (each case annotated with source file:line). The README documents this, but consider adding a CI step or a grep-based check that flags when prompt-emitting lines in the extension source change without a corresponding update here. That would catch desync earlier than a full Tier 2 run.

jongio

Checked the Go PTY test driver, answer matcher logic, console helpers, pipeline YAML, and README.

Prior feedback (errors.AsType, t.Context(), yaml struct parsing, stderr redirect) is all addressed. CI is green across all checks.

The timeout hierarchy, LIFO cleanup ordering, assertSafeTestDir guard, and event-driven prompt loop with loop detection are correctly implemented. The standalone-4 checker handles edge cases well (model names, versions, status codes). Dependencies were already in go.sum as indirect deps; the go.mod change just promotes them to direct.

One non-blocking suggestion on the dispatchPrompt default case.

jongio · 2026-06-25T09:17:14Z

+		r.enter()
+
+	default:
+		r.enter()


Non-blocking: when the default case fires, it sends Enter with no indication in the test log that the dispatch didn't match a specific case. The prompt text IS logged before dispatch (line 301), but adding something like r.t.Logf("unhandled prompt (default Enter): %s", truncate(prompt, 100)) here would make CI investigations easier when a new or changed prompt hits the catch-all, since it would distinguish "prompt matched and answered correctly" from "prompt fell through to default."

v1212 added ext-agents azure.ai.agents extension ext-foundry azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes}, microsoft.foundry labels Jun 22, 2026

microsoft-github-policy-service Bot assigned v1212 Jun 22, 2026

Jian Wu added 2 commits June 22, 2026 19:54

v1212 marked this pull request as ready for review June 22, 2026 12:08

Copilot AI review requested due to automatic review settings June 22, 2026 12:08

v1212 requested review from JeffreyCA, RickWinter, danieljurek, glharper, tg-msft, therealjohn, trangevi, trrwilson and vhvb1989 as code owners June 22, 2026 12:08

Copilot started reviewing on behalf of v1212 June 22, 2026 12:09 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

v1212 requested a review from Copilot June 22, 2026 12:28

Copilot started reviewing on behalf of v1212 June 22, 2026 12:34 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

v1212 requested a review from Copilot June 22, 2026 12:46

Copilot started reviewing on behalf of v1212 June 22, 2026 12:49 View session

Copilot AI reviewed Jun 22, 2026

View reviewed changes

v1212 mentioned this pull request Jun 22, 2026

Add automated test infrastructure (PR-gate + live E2E) for the azure.ai.agents extension #8759

Open

4 tasks

v1212 requested a review from Copilot June 22, 2026 13:18

Drop redundant f-prefix on two placeholder-less log strings

019d402

pyflakes flagged two f-strings with no interpolation; they are static text, so the f-prefix is dead. Plain string literals keep the linter (and Copilot) clean. No behavior change.

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md Outdated

v1212 requested a review from Copilot June 23, 2026 05:52

Copilot started reviewing on behalf of v1212 June 23, 2026 05:53 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

trangevi reviewed Jun 23, 2026

View reviewed changes

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml

jongio reviewed Jun 24, 2026

View reviewed changes

vhvb1989 mentioned this pull request Jun 24, 2026

Support extensions hosting their own functional tests (public test-support module + source-scoped replace) #8795

Open

9 tasks

Jian Wu added 2 commits June 24, 2026 18:02

jongio reviewed Jun 24, 2026

View reviewed changes

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated