-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Hi team,
I’m working with a multi-agent orchestration setup using Google ADK:
- Root agent:
CloudOpsCoordinator - Agent as tool:
BillingAgent BillingAgentitself uses an internal tool (SQL executor) to query billing tables.
🧩 Issue 1 — Evaluation Missing Tool Execution Inside Child Agent
In the Events → Request/Response panel, I can see:
- The root agent invoking the
BillingAgentas a tool - The BillingAgent’s own reasoning + LLM call
However, I cannot see the tool execution that is triggered inside the BillingAgent—specifically:
- The request payload sent from BillingAgent → SQL tool
- The SQL tool invocation details
- The intermediate outputs
When I run an Evaluation Set, only the Root Agent to BillingAgent tool call is captured.
But the tool call inside BillingAgent is completely missing, which means the evaluation cannot test the full agent execution cycle for a user query.
👉 Question:
Is this expected behavior?
How can we capture and evaluate nested tool calls (tool triggered by another agent) during evaluation?
Without this, we cannot test the complete agent workflow end-to-end.
🧩 Issue 2 — Tool Parameter Exact Matching During Evaluation
My BillingAgent uses a tool that executes SQL queries.
But in evaluation docs I can see:
- The evaluator compares exact SQL text or exact parameter match.
- Agent-generated SQL may vary in ordering, whitespace, or alias naming.
- Even if the SQL is semantically correct, the evaluation marks it as Fail because the tool parameters are not an exact match.
👉 Question:
How can I configure evaluation to allow flexible SQL generation instead of requiring an exact string match?
Is there a way to:
- Provide a semantic SQL matcher
- Allow partial match / fuzzy match on tool parameters?
Right now, evaluation always fails for these scenarios because the SQL is not identical.
🙏 Looking for Guidance
Would really appreciate direction on:
- How to enable visibility + evaluation of nested tool calls inside sub-agents.
- How to relax SQL parameter matching in evaluations for more realistic testing.
Thanks!
