Add tool search support for OpenAI Responses API#4751
Conversation
Implements dual-mode tool search (hosted server-side + client-side embeddings) for the Responses API, mirroring the existing Anthropic implementation. - Extract shared nonDeferredToolNames and helper functions into toolSearch.ts - Add defer_loading support to createResponsesRequestBody() with tool splitting - Handle tool_search_call/tool_search_output stream events in OpenAIResponsesProcessor - Add OpenAiToolSearchTool type and config keys (ResponsesApiToolSearchEnabled/Mode) - Extend ToolSearchTool registration to include gpt-5.4 models - Add 12 unit tests for tool search request body and stream processing
…update related configurations
3cdc019 to
68b1cc3
Compare
There was a problem hiding this comment.
Pull request overview
Adds OpenAI Responses API “tool_search” support to enable deferred tool loading (via client-executed tool search) in agent conversations, aligning with the existing tool-search approach used for Anthropic.
Changes:
- Introduces a new experiment-based config flag to enable Responses API tool search and gates it by model support.
- Updates Responses API request/stream handling to emit/consume
tool_search_call+tool_search_outputand to mark deferred tools withdefer_loading. - Adds unit tests validating request-body tool deferral and stream-event handling for tool search.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/platform/networking/common/openai.ts | Adds model-gated enablement helper for Responses API tool search. |
| src/platform/networking/common/networking.ts | Extends endpoint body tools typing to include tool_search. |
| src/platform/networking/common/fetch.ts | Defines OpenAiToolSearchTool and extends tool type guard input union. |
| src/platform/networking/common/anthropic.ts | Refactors/moves Anthropic tool-search constants + enablement helpers. |
| src/platform/endpoint/node/responsesApi.ts | Implements deferred tool loading + tool_search round-tripping and stream processing. |
| src/platform/endpoint/node/test/responsesApiToolSearch.spec.ts | New tests for tool deferral + Responses stream tool-search events. |
| src/platform/configuration/common/configurationService.ts | Adds ResponsesApiToolSearchEnabled experiment-based setting. |
| src/extension/tools/node/toolSearchTool.ts | Enables ToolSearchTool for GPT-5.4 family in addition to Anthropic models. |
| src/extension/prompt/vscode-node/requestLoggerImpl.ts | Improves tool name logging to handle tool_search. |
| package.nls.json | Adds localized description for the new setting. |
| package.json | Registers new preview/onExp setting in extension contributions. |
| import { ConfigKey, IConfigurationService } from '../../configuration/common/configurationService'; | ||
| import { IExperimentationService } from '../../telemetry/common/nullExperimentationService'; | ||
| import { IChatEndpoint } from './networking'; | ||
|
|
||
| /** Model ID prefixes that support Responses API tool search. Per OpenAI docs: "Only gpt-5.4 and later models support tool_search." */ | ||
| export const OPENAI_TOOL_SEARCH_SUPPORTED_MODELS = [ | ||
| 'gpt-5.4', | ||
| ] as const; | ||
|
|
||
| export function isResponsesApiToolSearchEnabled( | ||
| endpoint: IChatEndpoint | string, | ||
| configurationService: IConfigurationService, | ||
| experimentationService: IExperimentationService, | ||
| ): boolean { | ||
| const effectiveModelId = typeof endpoint === 'string' ? endpoint : endpoint.model; | ||
| if (!OPENAI_TOOL_SEARCH_SUPPORTED_MODELS.some(prefix => effectiveModelId.toLowerCase().startsWith(prefix))) { |
There was a problem hiding this comment.
isResponsesApiToolSearchEnabled uses endpoint.model for gating, but IChatEndpoint.model can be copilot-base (see docstring in networking.ts), which would incorrectly disable tool search for supported models selected via endpoint.family. Consider switching the check to endpoint.family (or a { family: string } shape) and making the IChatEndpoint import import type (or removing it) to avoid introducing a runtime circular dependency between openai.ts and networking.ts.
| type: 'function', | ||
| name: tool.function.name, | ||
| description: tool.function.description || '', | ||
| defer_loading: true, |
There was a problem hiding this comment.
buildToolSearchOutputTools marks returned tool definitions with defer_loading: true. For tool_search_output, these tools are being provided as loaded definitions; keeping defer_loading set may cause them to still be treated as deferred (potentially triggering repeated tool_search cycles or preventing invocation). Consider omitting defer_loading (or explicitly setting it false) for tools included in the output payload.
| defer_loading: true, |
| } | ||
|
|
||
| function rawMessagesToResponseAPI(modelId: string, messages: readonly Raw.ChatMessage[], ignoreStatefulMarker: boolean): { input: OpenAI.Responses.ResponseInputItem[]; previous_response_id?: string } { | ||
| // ── Responses API tool search types ──────────────────────────────────\n// These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search |
There was a problem hiding this comment.
The section header comment includes a literal \n sequence (// ... ──────────────────────────────────\n// ...), which looks like an accidental escape that will be rendered verbatim in source. Split this into two comment lines without the escaped newline for readability.
| // ── Responses API tool search types ──────────────────────────────────\n// These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search | |
| // ── Responses API tool search types ────────────────────────────────── | |
| // These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search |
| } | ||
| if (entry.chatParams.body?.tools) { | ||
| const toolNames = entry.chatParams.body.tools.map(t => isOpenAiFunctionTool(t) ? t.function.name : t.name); | ||
| const toolNames = entry.chatParams.body.tools.map((t: any) => isOpenAiFunctionTool(t) ? t.function.name : t.name ?? t.type); |
There was a problem hiding this comment.
The new toolNames mapping uses (t: any), which drops type safety in this logger and can hide future schema issues (e.g., distinguishing Responses function tools vs tool_search). Consider avoiding any by widening isOpenAiFunctionTool’s input type (if needed) and using property checks like 'name' in t / discriminating on t.type to derive the display name.
| const toolNames = entry.chatParams.body.tools.map((t: any) => isOpenAiFunctionTool(t) ? t.function.name : t.name ?? t.type); | |
| const toolNames = entry.chatParams.body.tools.map(t => { | |
| if (isOpenAiFunctionTool(t)) { | |
| return t.function.name; | |
| } | |
| if (t && typeof t === 'object') { | |
| if ('name' in t && typeof (t as { name: unknown }).name === 'string') { | |
| return (t as { name: string }).name; | |
| } | |
| if ('type' in t && typeof (t as { type: unknown }).type === 'string') { | |
| return (t as { type: string }).type; | |
| } | |
| } | |
| return 'unknown'; | |
| }); |
| tokenizer: 'cl100k_base' as any, | ||
| acquireTokenizer: () => { throw new Error('Not implemented'); }, | ||
| processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); }, | ||
| makeRequest: () => { throw new Error('Not implemented'); }, | ||
| } as unknown as IChatEndpoint; |
There was a problem hiding this comment.
This test helper relies on as any / as unknown as IChatEndpoint to satisfy types. To keep tests more robust and aligned with existing patterns, consider introducing a small MockChatEndpoint that implements the minimal IChatEndpoint surface (with correctly typed tokenizer, etc.) instead of casting.
| tokenizer: 'cl100k_base' as any, | |
| acquireTokenizer: () => { throw new Error('Not implemented'); }, | |
| processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); }, | |
| makeRequest: () => { throw new Error('Not implemented'); }, | |
| } as unknown as IChatEndpoint; | |
| tokenizer: undefined, | |
| acquireTokenizer: () => { throw new Error('Not implemented'); }, | |
| processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); }, | |
| makeRequest: () => { throw new Error('Not implemented'); }, | |
| }; |
| "github.copilot.config.anthropic.promptOptimization": "Prompt optimization mode for Claude 4.6 models.\n\n- `control`: Uses the current default prompt (no changes).\n- `combined`: Uses a single optimized prompt for both Opus and Sonnet.\n- `split`: Uses separate optimized prompts for Opus (bounded exploration) and Sonnet (full persistence).\n\n**Note**: This is an experimental feature for A/B testing prompt configurations.", | ||
| "github.copilot.config.anthropic.toolSearchTool.enabled": "Enable tool search tool for Anthropic models. When enabled, tools are dynamically discovered and loaded on-demand using natural language search, reducing context window usage when many tools are available.", | ||
| "github.copilot.config.anthropic.toolSearchTool.mode": "Controls how tool search works for Anthropic models. 'server' uses Anthropic's built-in regex-based tool search. 'client' uses local embeddings-based semantic search for more accurate tool discovery.", | ||
| "github.copilot.config.responsesApi.toolSearchTool.enabled": "Enable tool search for OpenAI Responses API models. When enabled, tools are dynamically discovered and loaded on-demand using embeddings-based search, reducing context window usage when many tools are available.", |
There was a problem hiding this comment.
should we also add github.copilot.config.responsesApi.toolSearchTool.mode to control whether we use the server side or client side tool search tool?
| } | ||
| } | ||
|
|
||
| // Build final tools array |
There was a problem hiding this comment.
Is this going to happen for every request? can we make this and toolsMap as static on this and use it for all toolserach enabled requests?
| type: 'function', | ||
| strict: false, | ||
| parameters: (tool.function.parameters || {}) as Record<string, unknown>, | ||
| ...(isDeferred ? { defer_loading: true } : {}), |
There was a problem hiding this comment.
https://developers.openai.com/api/docs/guides/tools-tool-search, Are we grouping this by type like below?:
For maximum token savings, we recommend grouping deferred functions into namespaces or MCP servers with clear, high-level descriptions that give the model a strong overview of what is contained within them, so it can effectively search and load only the relevant functions. As a best practice, aim to keep each namespace to fewer than 10 functions for better token efficiency and model performance.
{
"tools": [
{
"type": "namespace",
"name": "crm",
"description": "CRM tools for customer lookup and order management.",
"tools": [
{
"type": "function",
"name": "list_open_orders",




No description provided.