Skip to content

Add live golden-path (Tier 2) pipeline for azd ai agent extension#8758

Open
v1212 wants to merge 24 commits into
mainfrom
wujia/ai-agent-live-tests
Open

Add live golden-path (Tier 2) pipeline for azd ai agent extension#8758
v1212 wants to merge 24 commits into
mainfrom
wujia/ai-agent-live-tests

Conversation

@v1212

@v1212 v1212 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Fixes #8759

Summary

  • Adds eng/pipelines/ext-azure-ai-agents-live.yml: an on-demand / weekly Azure DevOps pipeline that runs the Tier 2 live golden path (init → provision → deploy → invoke → down) for the azure.ai.agents extension against a real Azure (TME) subscription, for both code and container deploy modes.
  • Adds the Tier 2 live driver as a Go test (tier2_live_test.go, console_test.go, assert.go) under cli/azd/extensions/azure.ai.agents/tests/e2e-live/, driving azd ai agent through a pseudo-terminal with go-expect + vt10x + creack/pty (replacing the original tmux/Python prototype from test: add static E2E tests for azure.ai.agents extension #8692), with CI auth detection for Azure DevOps (TF_BUILD / E2E_USE_AZ_CLI_AUTH).
  • Adds a README.md documenting how to run (CI + local) and the one-time ADO setup.

Together with the PR-gate tests in #8754 (Tier 0 offline + Tier 1 recording/playback), this covers Tier 0/1/2 from the original prototype (#8692).

At a glance

  1. What it tests — the full live golden path init → provision → deploy → invoke → down of azd ai agent, driving a real CLI over a pseudo-terminal (Go go-expect + vt10x + creack/pty), for both code (zip) and container (ACR build) deploy modes, run sequentially. The interactive init phase is an event-driven prompt loop (wait for the survey UI to settle, read the rendered screen, dispatch on the verbatim prompt strings) rather than a fixed script, so it tolerates the live model/region/project branches. The invoke step asserts a live model answers "what is 2+2?" with a standalone 4 / four.

  2. How to triggertrigger: none / pr: none, so it never runs on PR push. Run it via: a PR comment /azp run ext-azure-ai-agents-live (requires repo write), the weekly Mon 07:00 UTC schedule, or a manual ADO queue (pick deployModes = both / code / container).

  3. Identity / subscription / what it does — authenticates as the azure-sdk-tests ADO service connection (Workload Identity Federation / OIDC) into the shared TME test subscription, region eastus2. It creates a fresh Foundry project + gpt-4o-mini deployment (+ an ACR in container mode) and the agent service, exercises them, then tears them down. az logs in via OIDC and azd reuses that session (auth.useAzCliAuth + keepAzSessionActive: true, so the WIF token survives the multi-minute run).

  4. Leftover resources?No, by design. Each mode registers a t.Cleanup that runs azd down --force --purge, and a teardown failure fails the run (never green while leaking). A condition: always() AzureCLI@2 step is a second safety net: it force-purges any project dirs left under /tmp/e2e-tests/tier2-*/ after a crash or timeout. (Crash-path cleanup is best-effort, so the TME subscription is worth an occasional spot-check.)

  5. Runs on Azure DevOps, not GitHub Actions — the YAML lives in this GitHub repo (eng/pipelines/…) but is an ADO pipeline (extends: 1es-redirect, task: AzureCLI@2, $(Build.*), ##vso[...]), executed by the azure-sdk ADO project. GitHub CI for this PR stays green/inert; the live run only happens in ADO.

  6. Post-merge admin action — needs an Azure DevOps admin on the azure-sdk project (Azure SDK EngSys)not a GitHub admin — to: (a) register the pipeline as ext-azure-ai-agents-live (the exact name /azp run uses) against this repo + YAML path; (b) confirm the azure-sdk-tests service connection maps to the TME subscription with RBAC to create Foundry projects and deploy models (Contributor + Azure AI Developer + Cognitive Services Contributor, or equivalent); (c) kick the first /azp run to validate the keepAzSessionActive auth path. Merging the PR itself only needs a GitHub maintainer approval. The init GitHub token is already wired via the ambient azuresdk-github-pat org secret, so no extra secret setup is needed.

Testing

  • The golden-path flow was validated end-to-end in the test: add static E2E tests for azure.ai.agents extension #8692 prototype fork run (code ~669s, container ~711s). The driver has since been rewritten from tmux/Python to the Go PTY harness in this PR; the live run of the Go version is exercised post-merge via /azp run (the test is gated behind AZURE_AI_AGENTS_E2E_LIVE=1 and never runs in PR CI).
  • Local validation of the Go harness: gofmt, go vet, golangci-lint (ubuntu + windows), and a go test -c compile all clean on the linux build tag; the no-tag assert.go answer-matching logic is covered by assert_test.go (runs on the host); README.md passes cspell.
  • Driver design choices hardened across review: an event-driven prompt loop with output-settle detection (replacing a fixed poll) so live conditional prompts don't desync; loop detection that advances past or fails out of a stuck prompt; t.Context() as the run-context parent so go test -timeout cancellation propagates; LIFO t.Cleanup ordering so teardown runs before the copied AZD_CONFIG_DIR is removed; a destructive-RemoveAll guardrail on the test dir; and a robust standalone-token check for the invoke result (accepts 4 or four, ignores incidental 4s such as gpt-4o-mini / 4.1 / 404).
  • Timeout hierarchy: per-mode runTimeout (60 min, a context deadline) > the sum of the per-phase budgets (init/provision/deploy/invoke ≈ 32 min) so a slow-but-healthy run is never preempted mid-phase; both modes fit under go test -timeout 125m, and the ADO AzureCLI@2 step (130 min) and job (150 min) caps sit strictly above that, so a run can never be hard-killed mid-azd down (which would leak live resources).
  • The ADO pipeline itself only runs once registered in ADO (inert in GitHub CI by design).

Adds eng/pipelines/ext-azure-ai-agents-live.yml, an on-demand/weekly Azure DevOps pipeline that drives the real 'azd ai agent' CLI through tmux against live Azure (TME), exercising init -> provision -> deploy -> invoke -> down for both code and container deploy modes.

This is the live counterpart to the PR-gate checks (Tier 0 offline + Tier 1 recording/playback in #8754). Per Azure SDK EngSys / SFI guidance, live access stays out of the automatic PR pipeline (trigger: none) and runs only via '/azp run ext-azure-ai-agents-live' or the weekly schedule.

The Tier 2 tmux driver (test_full_e2e.py, test_tier2.py) is migrated from the #8692 prototype; CI auth detection is extended to recognize Azure DevOps (TF_BUILD) and an explicit E2E_USE_AZ_CLI_AUTH override.
@v1212 v1212 added ext-agents azure.ai.agents extension ext-foundry azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes}, microsoft.foundry labels Jun 22, 2026
Jian Wu added 2 commits June 22, 2026 19:54
…zSessionActive

The azure-sdk-tests service connection uses Workload Identity Federation, whose
az session is isolated to the task's private AZURE_CONFIG_DIR and expires after
~10 min. Running the ~50 min golden-path test (and the cleanup) as plain bash
steps after a separate login step would fail auth on both counts. Run them
inside AzureCLI@2 with keepAzSessionActive:true (matching build-cli.yml) so the
session stays refreshed and reaches azd (auth.useAzCliAuth) through tmux, which
inherits AZURE_CONFIG_DIR. Subscription/tenant are now read in-script via
az account show instead of cross-step pipeline variables.
test_tier2.py always ran sequentially, but kept a tautological if-condition
(len==1 or len>1), an unused concurrent.futures import, a no-op --serial flag,
and a docstring/print claiming parallel execution. Simplify to an explicit
sequential loop and update the docstring to match. Also fix test_full_e2e.py's
module docstring to point at README.md (LOCAL-TEST-GUIDE.md does not exist).
@v1212 v1212 marked this pull request as ready for review June 22, 2026 12:08
Copilot AI review requested due to automatic review settings June 22, 2026 12:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a dedicated Azure DevOps pipeline and supporting Python drivers/docs to run the Tier 2 live golden-path E2E for the azure.ai.agents azd extension (init → provision → deploy → invoke → down) against real Azure resources, outside of the GitHub PR gate.

Changes:

  • Added an on-demand + weekly ADO pipeline (ext-azure-ai-agents-live) to run Tier 2 live E2E for both code and container deploy modes.
  • Added a tmux-driven Python Tier 2 runner (test_tier2.py) and a full golden-path driver (test_full_e2e.py) adapted for ADO CI auth detection.
  • Added documentation for running the live Tier 2 tests locally and in CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
eng/pipelines/ext-azure-ai-agents-live.yml New ADO pipeline wiring to build azd + extension, run Tier 2 live E2E under AzureCLI@2, and publish logs/cleanup.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Tier 2 orchestrator: runs code + container golden paths sequentially with isolation and timeout handling.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py tmux-driven end-to-end golden path driver (init/provision/deploy/invoke/down) with CI/local auth switching.
cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md Docs for CI setup (ADO registration/service connection/secrets) and local WSL execution.

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
- Use the ambient azure-sdk org secret `azuresdk-github-pat` for GH_TOKEN
  instead of an empty `GitHubPat` placeholder variable (mirrors eval-waza.yml);
  removes a misleading masked variable and the need for admin PAT setup.
- Harden the AzureCLI@2 inline script: `set -euo pipefail` and assign-then-verify
  subscription/tenant so an `az account show` failure fails fast (a plain
  `export X=$(...)` would have masked the error from set -e).
- Reword the extension-install comment to be self-contained (it no longer
  inaccurately claims to mirror lint-ext-azure-ai-agents.yml).
- Clarify the test_full_e2e.py auth prerequisite: only local WSL runs leave
  auth.useAzCliAuth unset; CI auto-enables az CLI auth.
- Clear tmux scrollback after env setup so the exported GH token cannot leak
  into capture() output on failures/timeouts.
- _cleanup_leaked_resources now checks azd down's return code and reports
  failures instead of always printing "Cleanup complete".

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
- Stream child E2E output live with a watchdog-enforced hard timeout
  instead of buffering everything via capture_output
- Shell-escape the GitHub token (shlex.quote) before exporting in tmux
- Clean up the per-mode AZD_CONFIG_DIR temp copy unless E2E_KEEP_ARTIFACTS
- Use sha256 instead of md5 for the agent-name uniqueness suffix
- Derive the agent binary arch from uname -m instead of hard-coding amd64

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_tier2.py Outdated
- Shell-escape HOME/PATH/TENANT, the cd target, and the agent name with
  shlex.quote() (consistent with the earlier token fix)
- On Tier 2 timeout, kill the child's detached tmux server so reused CI
  agents do not accumulate orphaned tmux sockets
@v1212 v1212 requested a review from Copilot June 22, 2026 13:18
pyflakes flagged two f-strings with no interpolation; they are static
text, so the f-prefix is dead. Plain string literals keep the linter (and
Copilot) clean. No behavior change.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/test_full_e2e.py Outdated
Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/README.md Outdated
The docstring and README claimed tmux >=3.4, but the live pipeline
installs tmux via apt-get on ubuntu-22.04 (LINUXVMIMAGE), which ships
3.2a. The driver only uses basic verbs (send-keys -l, capture-pane -p,
new-session -x/-y, clear-history, kill-session/server) available well
before 3.2, so >=3.4 was inaccurate and misleading. Align both docs to
>=3.2.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml
Comment thread eng/pipelines/ext-azure-ai-agents-live.yml

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid Tier 2 pipeline addition. The timeout hierarchy (phase budgets < child watchdog 60m < step 130m < job 150m) with explicit margin documentation is particularly well-thought-out, and the three-layer resource cleanup (in-test teardown, parent watchdog cleanup, pipeline condition: always() safety net) gives good confidence that leaked resources won't accumulate.

A few minor observations (none blocking):

File size (test_full_e2e.py, 925 lines): If more live test scenarios are planned (e.g., Tier 2 for other extensions), the tmux interaction helpers (lines ~79-170: tmux(), send(), capture(), wait_for(), run_cmd(), etc.) could be extracted into a shared utility module. For a single test driver this is fine.

Signal handling: The __main__ block doesn't wrap the phase chain in try/finally for tmux cleanup. If an unexpected exception occurs after setup() but before the tmux("kill-session") at the bottom, the tmux session/socket persists on the agent. The parent's watchdog and pipeline cleanup step cover this in practice, so it's not urgent.

Invoke regex: Nice work on (?<![\w.])4(?!\.\d)(?!\w) for the "2+2" assertion. Correctly rejects "gpt-4o-mini", "4.1", "404" while matching standalone "4". Clean.

@vhvb1989

Copy link
Copy Markdown
Member

Request: switch the Tier 2 interactive driver from tmux/Python to Go (go-expect + vt10x)

Thanks for this — the live golden-path coverage and the safety nets (watchdog, condition: always() purge, teardown-fails-the-run) are solid. One direction change I'd like before we standardize on this as the pattern for interactive extension tests: please drive the interactive flow with Go instead of tmux + Python.

Why

  • Language/toolchain consistency — the extension and azd are Go; a Go driver keeps tests in one toolchain with type-safe assertions, and avoids the external tmux >= 3.2 runtime dependency and the detached-tmux-server cleanup dance (kill-server, orphaned sockets on reused agents).
  • The libraries are already vendored — azd's dependency graph already contains the full Go-native expect stack (in cli/azd/go.sum):
    • github.com/creack/pty — real PTY for the spawned azd binary
    • github.com/Netflix/go-expectexpect-style send/expect API (the Go equivalent of Python pexpect)
    • github.com/hinshun/vt10x — terminal emulator that renders ANSI/cursor sequences into a 2D screen (the equivalent of tmux capture-pane -p)

Parity — yes, it can do everything the tmux/Python driver does

I went through the current driver capability-by-capability:

tmux/Python Go go-expect + vt10x
Real PTY running azd creack/pty (attach cmd.Stdin/out/err = console.Tty())
send-keys -l literal text Send / SendLine
Enter / arrow keys raw escape bytes ("\r", "\x1b[B", "\x1b[A") wrapped as named-key helpers
capture-pane -p rendered screen vt10x.State.String()
wait-for-substring / prompt loop ExpectString / ExpectRegexp (blocks until match/timeout)
assert highlighted menu item send key → read vt10x.State → assert rendered line
sequential commands in a session attach each azd command to the console, or spawn a bash in the PTY + sentinels (same model)
live-stream + tail expect.WithStdout(os.Stdout, tailBuffer)
watchdog / hard timeout / kill context + per-Expect timeout + cmd.Process.Kill()
detached tmux server must be killed N/A — PTY + child owned by the test process (a net simplification)

The one thing to validate is that vt10x renders the real azd ai agent init wizard identically to a real terminal (vt10x is an emulator; tmux uses a real terminal). It's a fairly complete xterm/VT100 emulator and this stack is purpose-built for testing exactly these CLIs, so risk is low — but worth a quick spike on the actual init flow before fully committing. Happy to drive/help with that spike.

Note: today only PHASE 1 (init) is interactiveprovision / deploy / invoke / down already run --no-prompt. So the interactive surface to port is just init; the rest stay as plain subprocess calls.

The bigger plan (future iterations) — a shared driver for all extensions and azd

This isn't a one-off ask. We're planning a shared, public test-support module (tracked in #8795) that exposes this interactive driver so every Go extension — and azd core — defines interactive tests without hand-rolling a per-extension harness. Sketch of the extension-facing API:

sess := extframework.StartInteractive(t, "ai", "agent", "init") // PTY + vt10x + go-expect, auto-cleanup via t.Cleanup
sess.ExpectString("Select a subscription")
sess.Send(extframework.KeyDown, extframework.KeyEnter)
sess.ExpectString("Enter a name")
sess.SendLine("my-agent")
sess.ExpectString("added to your azd project successfully")
require.Contains(t, sess.Screen(), "Done")

out, code := extframework.Run(t, "provision", "--no-prompt")     // non-interactive phases reuse the plain runner

The same driver is intended to serve Tier 1 (recorded/playback) and Tier 2 (live), sharing CLI_TEST_AZD_PATH and the recording session with the non-interactive runner. See #8795 for the full plan, acceptance criteria, and the source-scoped replace mechanism that lets extensions pin it from source for test builds only.

Suggested path forward

If the timing of #8795 is a concern, we can land the Go driver locally in the extension first and migrate it into the shared module afterward — but I'd prefer to avoid merging a tmux/Python driver that we know we'll replace. Let me know if you'd like me to prototype the go-expect+vt10x init driver to get you started.

Jian Wu added 2 commits June 24, 2026 18:02
Per review feedback, replace the tmux-driven Python golden-path driver
with a Go test so the live suite uses the same toolchain as the rest of
the extension and needs no Python/tmux on the agent.

- TestTier2Live drives the interactive `azd ai agent init` through a
  pseudo-terminal (Netflix/go-expect sends keys, hinshun/vt10x renders
  the screen, creack/pty provides the PTY); provision/deploy/invoke/down
  shell out to azd with --no-prompt. Per-mode 60m context drives the
  graceful azd-down teardown via t.Cleanup.
- The PTY driver is tagged //go:build linux; the answer matcher and its
  table unit tests build and run on any platform.
- Pipeline builds and runs `go test -run TestTier2Live` and drops the
  Python/tmux setup steps; promote go-expect/vt10x/creack-pty to direct
  requires; refresh README and cspell.
- Remove test_tier2.py and test_full_e2e.py.
…d source

Replace the init driver's fixed 3s polling with go-expect event-driven reads
that block until the survey UI settles, then dispatch on the verbatim prompt
strings the extension prints (each case annotated with its source file:line).
Select deploy mode via the --deploy-mode flag, which init auto-resolves rather
than prompts when a manifest is supplied. Document the dispatch-loop design and
the live-only verification path in the README.

Addresses review feedback on #8758.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite (commits c62a696 and 4d27988) directly addresses @vhvb1989's direction request. The interactive driver now uses go-expect + vt10x + creack/pty with event-driven synchronization (waitForQuiet) instead of fixed-interval polling. This is solid work.

What looks good:

  • Event-driven sync: promptQuiet/listSettle waits for the UI to stop emitting before dispatching, rather than sleeping a fixed interval. Correct approach for flake-free interactive tests.
  • Timeout hierarchy: per-mode runTimeout (60m) < go test -timeout (125m) < ADO step (130m) < job (150m). Each layer has room for graceful teardown before the next one kills it.
  • Cleanup safety: assertSafeTestDir blocks dangerous os.RemoveAll on protected paths; t.Cleanup(r.teardown) guarantees azd down --force --purge runs even on failure; the condition: always() pipeline step is a crash-path backstop.
  • Platform-portable assert logic: assert.go builds on all platforms (no build tag), with the regex-free standalone-"4" detection explained clearly.
  • Pipeline arch detection: uname -m mapping to GOARCH means the step won't break if the pool moves off amd64.

One observation for the conversation with @vhvb1989:

The latest commits already deliver what was requested. The old Python/tmux files are gone, the Go driver is in place, and it uses exactly the libraries and approach described in the direction-change comment. The driver is local to the extension's tests/e2e-live/ for now (as suggested), ready to lift into the shared module when #8795 lands.

Minor notes (non-blocking):

  1. tier2_live_test.go at 859 lines is large for a single file, but splitting a sequential test flow (init/provision/deploy/invoke/down + all helpers) across multiple files would hurt readability. Acceptable.
  2. The dispatchPrompt source-line annotations (e.g., init.go:722) are useful for maintenance but will become stale as the extension evolves. The README already documents this trade-off ("changes there can require updating dispatchPrompt"). Consider a lint or generator if drift becomes a maintenance burden.
  3. findServiceName uses line-by-line string matching to parse azure.yaml. This is pragmatic for the known scaffolded output format and avoids pulling in a YAML library for one test helper. Fine as-is.

@v1212

v1212 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @vhvb1989 — done. Ported the interactive init driver to Go (go-expect + vt10x + creack/pty) and dropped the tmux/Python files; the --no-prompt phases stay as os/exec calls.

Per "one thing to validate" note: I did a source-grounded spike and calibrated the prompt dispatch to the real azd ai agent init strings (event-driven sync; deploy mode via --deploy-mode). The one thing I can't check locally — vt10x rendering the live wizard — gets exercised on the first ADO /azp run; I'll validate/debug from there.

@v1212

v1212 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @jongio. The driver changed — the Go rewrite removed the original Python/tmux code, so those first-review points are gone. The three notes on the new code are acknowledged trade-offs (the dispatchPrompt drift one is documented in the README); I'll revisit a generator if that drift becomes painful. No code changes needed — ready for another look.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My prior analysis (two rounds against this same HEAD) still holds. The Go rewrite using go-expect + vt10x + creack/pty is solid: event-driven synchronization via waitForQuiet, clean timeout hierarchy, three-layer resource cleanup, and well-tested assert logic.

One code modernization note below. Also: the PR currently has merge conflicts that need resolving before merge.

No blocking issues from my side.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite from tmux+Python to a PTY-based Go driver is a significant improvement: same toolchain as the rest of the extension, no Python/tmux dependency on the agent, and the event-driven dispatch loop (wait for survey UI to settle, then match the rendered prompt) is more robust than fixed polling.

A few observations on the final state:

  1. findServiceName naive YAML walk (tier2_live_test.go): The line-by-line parser assumes a specific indentation structure. Since azd ai agent init always scaffolds the same format this is fine today, but if the template output changes it will silently break. Consider a targeted grep -m1 approach or importing a minimal YAML decoder if the function gets reused.

  2. Timeout cascading is well-designed: per-mode 60m > sum-of-phases 50m (10m teardown margin), step 130m > go-test 125m, job 150m > step. This ensures graceful teardown always runs before hard kills, preventing resource leaks. The commit that adjusted these (db53b06) explains the invariant clearly.

  3. responseHasExpectedAnswer rejects "4.0" by design (the 4 followed by . then digit rule). Worth a brief code comment noting this is intentional, since a future maintainer might wonder why a valid math answer is rejected. The existing unit tests cover this implicitly via the "version" case ("4.1") but an explicit "4.0" case in the table would make the design intent obvious.

  4. Dependency versions (go-expect, creack/pty, vt10x) are the same 2022 pseudo-versions that AlecAivazis/survey uses in its own test suite. They are proven to work well together for this exact use case. No action needed, but worth noting in case a future dep audit flags them as stale.

Jian Wu added 2 commits June 25, 2026 14:20
Per AGENTS.md Go 1.26 guidance, switch exitCode() from errors.As to errors.AsType[*exec.ExitError]. Add an explicit "4.0" case to the assert table and document why a decimal answer is deliberately rejected.
# Conflicts:
#	cli/azd/extensions/azure.ai.agents/cspell.yaml

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit on the safety-net cleanup step: the 2>/dev/null on line 213 swallows all diagnostic output from azd down. If a crash-path cleanup fails, the pipeline log won't show why. Since the task already sets continueOnError: true (so a non-zero exit can't fail the build), consider 2>&1 || true instead to keep diagnostic output visible for post-mortem investigation while still preventing the set -e from aborting the loop.

Not blocking, the current pattern is defensible for a quiet safety net.

Comment thread eng/pipelines/ext-azure-ai-agents-live.yml Outdated
Use 2>&1 instead of 2>/dev/null in the always() cleanup loop so a failed force-purge is visible for debugging a resource leak; keep || true so set -e doesn't abort the loop.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go rewrite of the Tier 2 driver is well-structured: event-driven prompt dispatch, proper timeout hierarchy (per-mode 60m < go test 125m < step 130m < job 150m), and LIFO cleanup ordering so teardown always runs before the config dir it depends on is removed.

One minor observation below on findServiceName; everything else looks solid after 12+ rounds of iteration.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash-path cleanup now surfaces stderr via 2>&1 instead of swallowing it, making failed force-purge attempts visible in the pipeline log. The || true guard still prevents the loop from stopping on individual azd down failures.

findServiceName (tier2_live_test.go:649) parses azure.yaml by string matching rather than a YAML library. This works for the predictable output of azd ai agent init, but could silently break on edge-case YAML (a comment containing services:, or a multi-line string value). Low risk since the generated files are predictable, but worth noting if the test starts failing on valid projects.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated
Replace the indentation/colon line scan with a minimal struct unmarshal (gopkg.in/yaml.v3, already a direct dep) so the service-name lookup tolerates comments and indentation changes in scaffolded azure.yaml.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion on context propagation in the test runner.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion on context propagation in the test runner.

Comment thread cli/azd/extensions/azure.ai.agents/tests/e2e-live/tier2_live_test.go Outdated

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the full diff (8 files, +1550/-1). My prior feedback from yesterday was addressed in the latest commits (errors.AsType, stderr visibility in cleanup, struct-based YAML parsing, t.Context() propagation).

Fresh pass on the current state: the test architecture is well-designed. The event-driven prompt loop (go-expect waitForQuiet instead of fixed-interval polling), the bounded ring buffer for diagnostics, the LIFO cleanup ordering, and the tiered timeout hierarchy (per-phase < per-mode < go-test < ADO step < ADO job) are all solid engineering.

One non-blocking note for future maintenance: the dispatchPrompt cases are tightly coupled to the extension's prompt strings (each case annotated with source file:line). The README documents this, but consider adding a CI step or a grep-based check that flags when prompt-emitting lines in the extension source change without a corresponding update here. That would catch desync earlier than a full Tier 2 run.

@jongio jongio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the Go PTY test driver, answer matcher logic, console helpers, pipeline YAML, and README.

Prior feedback (errors.AsType, t.Context(), yaml struct parsing, stderr redirect) is all addressed. CI is green across all checks.

The timeout hierarchy, LIFO cleanup ordering, assertSafeTestDir guard, and event-driven prompt loop with loop detection are correctly implemented. The standalone-4 checker handles edge cases well (model names, versions, status codes). Dependencies were already in go.sum as indirect deps; the go.mod change just promotes them to direct.

One non-blocking suggestion on the dispatchPrompt default case.

r.enter()

default:
r.enter()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: when the default case fires, it sends Enter with no indication in the test log that the dispatch didn't match a specific case. The prompt text IS logged before dispatch (line 301), but adding something like r.t.Logf("unhandled prompt (default Enter): %s", truncate(prompt, 100)) here would make CI investigations easier when a new or changed prompt hits the catch-all, since it would distinguish "prompt matched and answered correctly" from "prompt fell through to default."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ext-agents azure.ai.agents extension ext-foundry azure.ai.{agents,connections,inspector,projects,routines,skills,toolboxes}, microsoft.foundry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add automated test infrastructure (PR-gate + live E2E) for the azure.ai.agents extension

5 participants