.NET: Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4)#5952
.NET: Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4)#5952Copilot wants to merge 5 commits into
Conversation
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/79cbae85-b46f-470f-978d-7970d6f9e091 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
…#5350 Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/baf2578f-bdc5-479b-8ebc-1b198f26c151 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/153b10f3-2555-45fe-b264-42a2e9100cf2 Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/agent-framework/sessions/e7100d8a-ca6d-48e6-8210-78ba33dbd53c Co-authored-by: lokitoth <6936551+lokitoth@users.noreply.github.com>
|
@copilot Add a variant of A2 which uses Handoffs |
Added in 1011a41 as |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds investigation scaffolding for issue #5350 by introducing layered repro tests that validate ToolApprovalRequestContent.ToolCall (polymorphic FunctionCallContent) survives JSON checkpoint round-trips, and documents a validation plan. Also pins a separate group-chat-specific duplicate-key failure observed after resume when approving.
Changes:
- Add
ToolApprovalRequestCheckpointReproTestswith serializer-only and end-to-end checkpoint/resume coverage, including GroupChat/Handoff variants and a store that mimics string-based persistence. - Add a working validation-plan doc capturing results, next investigative tracks, and the unrelated A2 duplicate-key crash.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/ToolApprovalRequestCheckpointReproTests.cs | New layered repro + E2E tests for #5350 plus Track A variants and store shim |
| docs/working/issue-5350-root-cause-validation-plan.md | New investigation write-up documenting findings + next steps |
| [Fact] | ||
| public void Repro_5350_CaptureWireFormat_ForInspection() | ||
| { | ||
| JsonMarshaller marshaller = new(); | ||
|
|
||
| JsonElement element = marshaller.Marshal(MakeApprovalRequest()); | ||
| string serialized = element.GetRawText(); | ||
|
|
||
| // Always-true assertion — purpose of this test is to expose the wire format. | ||
| serialized.Should().NotBeNullOrEmpty(); | ||
| serialized.Should().Contain(CallId, "the call id should be present in the serialized form"); | ||
| } |
Motivation and Context
Issue #5350 reports that
ToolApprovalRequestContent.ToolCallloses its concreteFunctionCallContenttype after aJsonCheckpointStoreround-trip, breakingFunctionInvokingChatClient's pattern match on resume. The OP attributes this to missing[JsonPolymorphic]/[JsonDerivedType]metadata or a serializer-options chain that does not reachAIJsonUtilities.DefaultOptions; a maintainer comment notes both are already in place. We need a reproducible baseline before attempting a fix.Description
Adds
ToolApprovalRequestCheckpointReproTests(11 tests, all passing) layered from the pure serializer up to full agent workflows, plus a written validation plan. The OP's hypothesis does not reproduce at any layer exercised; one test surfaces a separate, real bug worth fixing independently.JsonMarshallerround-trip ofToolApprovalRequestContent, then wrapped inPortableValue, then asExternalRequest.Data, then 25× deterministic repetition, plus a wire-format capture confirming"$type": "functionCall"is emitted.RequestPortcheckpoint resume (6): fullCheckpointManager.CreateJson(InMemoryJsonStore) → RunStreamingAsync → SuperStep checkpoint → ResumeStreamingAsyncwith aRequestPort<TARC, TARR>.ChatClientAgentover aMockChatClientwithApprovalRequiredAIFunctionwrapping a countingAIFunction, bound directly into a single-agentWorkflowBuilder. Asserts resumedRequestInfoEvent.Request.Data.As<TARC>().ToolCall is FunctionCallContent, then sends the approval response and asserts the wrapped function is invoked exactly once and the resultingFunctionResultContentflows into the follow-up chat-client call.RunReproAsync+ReproHarness:GroupChatBuilder: FCC type still preserved after resume. Uncovers a separate bug: approving the resumed request throwsArgumentException: An item with the same key has already been added. Key: ficc_call-1fromFunctionInvokingChatClient.ExtractAndRemoveApprovalRequestsAndResponses. Test pins this observed behavior with aTODO(#5350-followup)-tagged assertion.HandoffWorkflowBuilder: same as A2 but usesAgentWorkflowBuilder.CreateHandoffBuilderWith(...)with a no-op peer agent so the handoff graph is valid (the mock chat client never emits ahandoff_to_*call, so the workflow stays on the initial approval-tool-equipped agent). FCC type preserved after resume and the workflow completes cleanly post-approval: tool invoked exactly once, zero errors, zero executor failures. This narrows A2's duplicate-key crash to the group-chat orchestration specifically (it does not occur on the handoff path).StringRoundTripJsonStore: wrapsInMemoryJsonStoreand round-trips everyJsonElementviaGetRawText()+JsonDocument.Parseon both commit and retrieve, emulating the OP's Dapper / SQLnvarcharhop. Identity-preserving; passes.JsonSerializerOptions:JsonSerializerDefaults.Webwith no AI resolver passed ascustomOptionstoCheckpointManager.CreateJson. ConfirmsJsonMarshaller's internalWorkflowsJsonUtilities.DefaultOptionschain always wins for knownAIContenttypes; external options cannot displace it.docs/working/issue-5350-root-cause-validation-plan.md: records the disproof and scopes remaining investigation tracks — request the OP's actual pre-/post-SQL JSON bytes (Track B), doc improvements for the option-chain / byte-preservation contract (Track C), and follow up on the A2 duplicate-key crash (now known to be group-chat-specific).No production code is modified; this is investigation scaffolding plus a pinned regression test for the bug surfaced by A2.
Contribution Checklist