Skip to content

.NET: Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4)#5952

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/5350-implement-repro-unit-test
Draft

.NET: Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4)#5952
Copilot wants to merge 5 commits into
mainfrom
copilot/5350-implement-repro-unit-test

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 19, 2026

Motivation and Context

Issue #5350 reports that ToolApprovalRequestContent.ToolCall loses its concrete FunctionCallContent type after a JsonCheckpointStore round-trip, breaking FunctionInvokingChatClient's pattern match on resume. The OP attributes this to missing [JsonPolymorphic]/[JsonDerivedType] metadata or a serializer-options chain that does not reach AIJsonUtilities.DefaultOptions; a maintainer comment notes both are already in place. We need a reproducible baseline before attempting a fix.

Description

Adds ToolApprovalRequestCheckpointReproTests (11 tests, all passing) layered from the pure serializer up to full agent workflows, plus a written validation plan. The OP's hypothesis does not reproduce at any layer exercised; one test surfaces a separate, real bug worth fixing independently.

  • Layered marshaller tests (1 – 5): direct JsonMarshaller round-trip of ToolApprovalRequestContent, then wrapped in PortableValue, then as ExternalRequest.Data, then 25× deterministic repetition, plus a wire-format capture confirming "$type": "functionCall" is emitted.
  • RequestPort checkpoint resume (6): full CheckpointManager.CreateJson(InMemoryJsonStore) → RunStreamingAsync → SuperStep checkpoint → ResumeStreamingAsync with a RequestPort<TARC, TARR>.
  • Maximal E2E (7): real ChatClientAgent over a MockChatClient with ApprovalRequiredAIFunction wrapping a counting AIFunction, bound directly into a single-agent WorkflowBuilder. Asserts resumed RequestInfoEvent.Request.Data.As<TARC>().ToolCall is FunctionCallContent, then sends the approval response and asserts the wrapped function is invoked exactly once and the resulting FunctionResultContent flows into the follow-up chat-client call.
  • Track A gap tests (8 – 11), sharing RunReproAsync + ReproHarness:
    • A2 — GroupChatBuilder: FCC type still preserved after resume. Uncovers a separate bug: approving the resumed request throws ArgumentException: An item with the same key has already been added. Key: ficc_call-1 from FunctionInvokingChatClient.ExtractAndRemoveApprovalRequestsAndResponses. Test pins this observed behavior with a TODO(#5350-followup)-tagged assertion.
    • A2b — HandoffWorkflowBuilder: same as A2 but uses AgentWorkflowBuilder.CreateHandoffBuilderWith(...) with a no-op peer agent so the handoff graph is valid (the mock chat client never emits a handoff_to_* call, so the workflow stays on the initial approval-tool-equipped agent). FCC type preserved after resume and the workflow completes cleanly post-approval: tool invoked exactly once, zero errors, zero executor failures. This narrows A2's duplicate-key crash to the group-chat orchestration specifically (it does not occur on the handoff path).
    • A3 — StringRoundTripJsonStore: wraps InMemoryJsonStore and round-trips every JsonElement via GetRawText() + JsonDocument.Parse on both commit and retrieve, emulating the OP's Dapper / SQL nvarchar hop. Identity-preserving; passes.
    • A4 — custom JsonSerializerOptions: JsonSerializerDefaults.Web with no AI resolver passed as customOptions to CheckpointManager.CreateJson. Confirms JsonMarshaller's internal WorkflowsJsonUtilities.DefaultOptions chain always wins for known AIContent types; external options cannot displace it.
  • docs/working/issue-5350-root-cause-validation-plan.md: records the disproof and scopes remaining investigation tracks — request the OP's actual pre-/post-SQL JSON bytes (Track B), doc improvements for the option-chain / byte-preservation contract (Track C), and follow up on the A2 duplicate-key crash (now known to be group-chat-specific).

No production code is modified; this is investigation scaffolding plus a pinned regression test for the bug surfaced by A2.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings May 19, 2026 14:26
Copilot AI review requested due to automatic review settings May 19, 2026 14:26
@lokitoth
Copy link
Copy Markdown
Member

@copilot Add a variant of A2 which uses Handoffs

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 19, 2026

@copilot Add a variant of A2 which uses Handoffs

Added in 1011a41 as Repro_5350_A2b_HandoffWorkflowBuilder_.... Uses AgentWorkflowBuilder.CreateHandoffBuilderWith(harness.Agent) with a no-op peer agent so the handoff graph is valid; the mock chat client never emits a handoff_to_* call so the workflow stays on the initial approval-tool-equipped agent. Finding: the handoff path completes cleanlyTARC.ToolCall is FunctionCallContent post-resume, tool invoked exactly once, zero errors/executor failures. The duplicate-key ArgumentException from A2 does not occur here, narrowing that separate bug to the group-chat orchestration specifically.

Copilot AI changed the title Issue #5350: maximal ChatClientAgent repro + Track A gap tests Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4) May 19, 2026
Copilot AI requested a review from lokitoth May 19, 2026 14:30
Copilot AI review requested due to automatic review settings May 19, 2026 14:34
@moonbox3 moonbox3 added documentation Improvements or additions to documentation .NET labels May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds investigation scaffolding for issue #5350 by introducing layered repro tests that validate ToolApprovalRequestContent.ToolCall (polymorphic FunctionCallContent) survives JSON checkpoint round-trips, and documents a validation plan. Also pins a separate group-chat-specific duplicate-key failure observed after resume when approving.

Changes:

  • Add ToolApprovalRequestCheckpointReproTests with serializer-only and end-to-end checkpoint/resume coverage, including GroupChat/Handoff variants and a store that mimics string-based persistence.
  • Add a working validation-plan doc capturing results, next investigative tracks, and the unrelated A2 duplicate-key crash.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dotnet/tests/Microsoft.Agents.AI.Workflows.UnitTests/ToolApprovalRequestCheckpointReproTests.cs New layered repro + E2E tests for #5350 plus Track A variants and store shim
docs/working/issue-5350-root-cause-validation-plan.md New investigation write-up documenting findings + next steps

Comment on lines +247 to +258
[Fact]
public void Repro_5350_CaptureWireFormat_ForInspection()
{
JsonMarshaller marshaller = new();

JsonElement element = marshaller.Marshal(MakeApprovalRequest());
string serialized = element.GetRawText();

// Always-true assertion — purpose of this test is to expose the wire format.
serialized.Should().NotBeNullOrEmpty();
serialized.Should().Contain(CallId, "the call id should be present in the serialized form");
}
@github-actions github-actions Bot changed the title Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4) .NET: Issue #5350: maximal ChatClientAgent repro + Track A gap tests (A2 / A2b / A3 / A4) May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation .NET

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants