Skip to content

Add tool search support for OpenAI Responses API#4751

Open
bhavyaus wants to merge 4 commits intomainfrom
dev/bhavyau/responses-api-tool-search
Open

Add tool search support for OpenAI Responses API#4751
bhavyaus wants to merge 4 commits intomainfrom
dev/bhavyau/responses-api-tool-search

Conversation

@bhavyaus
Copy link
Copy Markdown
Contributor

@bhavyaus bhavyaus commented Mar 27, 2026

No description provided.

Implements dual-mode tool search (hosted server-side + client-side embeddings)
for the Responses API, mirroring the existing Anthropic implementation.

- Extract shared nonDeferredToolNames and helper functions into toolSearch.ts
- Add defer_loading support to createResponsesRequestBody() with tool splitting
- Handle tool_search_call/tool_search_output stream events in OpenAIResponsesProcessor
- Add OpenAiToolSearchTool type and config keys (ResponsesApiToolSearchEnabled/Mode)
- Extend ToolSearchTool registration to include gpt-5.4 models
- Add 12 unit tests for tool search request body and stream processing
@bhavyaus bhavyaus force-pushed the dev/bhavyau/responses-api-tool-search branch from 3cdc019 to 68b1cc3 Compare March 27, 2026 22:08
@bhavyaus bhavyaus marked this pull request as ready for review March 27, 2026 22:12
Copilot AI review requested due to automatic review settings March 27, 2026 22:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenAI Responses API “tool_search” support to enable deferred tool loading (via client-executed tool search) in agent conversations, aligning with the existing tool-search approach used for Anthropic.

Changes:

  • Introduces a new experiment-based config flag to enable Responses API tool search and gates it by model support.
  • Updates Responses API request/stream handling to emit/consume tool_search_call + tool_search_output and to mark deferred tools with defer_loading.
  • Adds unit tests validating request-body tool deferral and stream-event handling for tool search.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/platform/networking/common/openai.ts Adds model-gated enablement helper for Responses API tool search.
src/platform/networking/common/networking.ts Extends endpoint body tools typing to include tool_search.
src/platform/networking/common/fetch.ts Defines OpenAiToolSearchTool and extends tool type guard input union.
src/platform/networking/common/anthropic.ts Refactors/moves Anthropic tool-search constants + enablement helpers.
src/platform/endpoint/node/responsesApi.ts Implements deferred tool loading + tool_search round-tripping and stream processing.
src/platform/endpoint/node/test/responsesApiToolSearch.spec.ts New tests for tool deferral + Responses stream tool-search events.
src/platform/configuration/common/configurationService.ts Adds ResponsesApiToolSearchEnabled experiment-based setting.
src/extension/tools/node/toolSearchTool.ts Enables ToolSearchTool for GPT-5.4 family in addition to Anthropic models.
src/extension/prompt/vscode-node/requestLoggerImpl.ts Improves tool name logging to handle tool_search.
package.nls.json Adds localized description for the new setting.
package.json Registers new preview/onExp setting in extension contributions.

Comment on lines +308 to +323
import { ConfigKey, IConfigurationService } from '../../configuration/common/configurationService';
import { IExperimentationService } from '../../telemetry/common/nullExperimentationService';
import { IChatEndpoint } from './networking';

/** Model ID prefixes that support Responses API tool search. Per OpenAI docs: "Only gpt-5.4 and later models support tool_search." */
export const OPENAI_TOOL_SEARCH_SUPPORTED_MODELS = [
'gpt-5.4',
] as const;

export function isResponsesApiToolSearchEnabled(
endpoint: IChatEndpoint | string,
configurationService: IConfigurationService,
experimentationService: IExperimentationService,
): boolean {
const effectiveModelId = typeof endpoint === 'string' ? endpoint : endpoint.model;
if (!OPENAI_TOOL_SEARCH_SUPPORTED_MODELS.some(prefix => effectiveModelId.toLowerCase().startsWith(prefix))) {
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isResponsesApiToolSearchEnabled uses endpoint.model for gating, but IChatEndpoint.model can be copilot-base (see docstring in networking.ts), which would incorrectly disable tool search for supported models selected via endpoint.family. Consider switching the check to endpoint.family (or a { family: string } shape) and making the IChatEndpoint import import type (or removing it) to avoid introducing a runtime circular dependency between openai.ts and networking.ts.

Copilot uses AI. Check for mistakes.
type: 'function',
name: tool.function.name,
description: tool.function.description || '',
defer_loading: true,
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildToolSearchOutputTools marks returned tool definitions with defer_loading: true. For tool_search_output, these tools are being provided as loaded definitions; keeping defer_loading set may cause them to still be treated as deferred (potentially triggering repeated tool_search cycles or preventing invocation). Consider omitting defer_loading (or explicitly setting it false) for tools included in the output payload.

Suggested change
defer_loading: true,

Copilot uses AI. Check for mistakes.
}

function rawMessagesToResponseAPI(modelId: string, messages: readonly Raw.ChatMessage[], ignoreStatefulMarker: boolean): { input: OpenAI.Responses.ResponseInputItem[]; previous_response_id?: string } {
// ── Responses API tool search types ──────────────────────────────────\n// These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section header comment includes a literal \n sequence (// ... ──────────────────────────────────\n// ...), which looks like an accidental escape that will be rendered verbatim in source. Split this into two comment lines without the escaped newline for readability.

Suggested change
// ── Responses API tool search types ──────────────────────────────────\n// These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search
// ── Responses API tool search types ──────────────────────────────────
// These match the shapes from https://developers.openai.com/api/docs/guides/tools-tool-search

Copilot uses AI. Check for mistakes.
}
if (entry.chatParams.body?.tools) {
const toolNames = entry.chatParams.body.tools.map(t => isOpenAiFunctionTool(t) ? t.function.name : t.name);
const toolNames = entry.chatParams.body.tools.map((t: any) => isOpenAiFunctionTool(t) ? t.function.name : t.name ?? t.type);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new toolNames mapping uses (t: any), which drops type safety in this logger and can hide future schema issues (e.g., distinguishing Responses function tools vs tool_search). Consider avoiding any by widening isOpenAiFunctionTool’s input type (if needed) and using property checks like 'name' in t / discriminating on t.type to derive the display name.

Suggested change
const toolNames = entry.chatParams.body.tools.map((t: any) => isOpenAiFunctionTool(t) ? t.function.name : t.name ?? t.type);
const toolNames = entry.chatParams.body.tools.map(t => {
if (isOpenAiFunctionTool(t)) {
return t.function.name;
}
if (t && typeof t === 'object') {
if ('name' in t && typeof (t as { name: unknown }).name === 'string') {
return (t as { name: string }).name;
}
if ('type' in t && typeof (t as { type: unknown }).type === 'string') {
return (t as { type: string }).type;
}
}
return 'unknown';
});

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +38
tokenizer: 'cl100k_base' as any,
acquireTokenizer: () => { throw new Error('Not implemented'); },
processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); },
makeRequest: () => { throw new Error('Not implemented'); },
} as unknown as IChatEndpoint;
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test helper relies on as any / as unknown as IChatEndpoint to satisfy types. To keep tests more robust and aligned with existing patterns, consider introducing a small MockChatEndpoint that implements the minimal IChatEndpoint surface (with correctly typed tokenizer, etc.) instead of casting.

Suggested change
tokenizer: 'cl100k_base' as any,
acquireTokenizer: () => { throw new Error('Not implemented'); },
processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); },
makeRequest: () => { throw new Error('Not implemented'); },
} as unknown as IChatEndpoint;
tokenizer: undefined,
acquireTokenizer: () => { throw new Error('Not implemented'); },
processResponseFromChatEndpoint: () => { throw new Error('Not implemented'); },
makeRequest: () => { throw new Error('Not implemented'); },
};

Copilot uses AI. Check for mistakes.
Comment thread package.nls.json
"github.copilot.config.anthropic.promptOptimization": "Prompt optimization mode for Claude 4.6 models.\n\n- `control`: Uses the current default prompt (no changes).\n- `combined`: Uses a single optimized prompt for both Opus and Sonnet.\n- `split`: Uses separate optimized prompts for Opus (bounded exploration) and Sonnet (full persistence).\n\n**Note**: This is an experimental feature for A/B testing prompt configurations.",
"github.copilot.config.anthropic.toolSearchTool.enabled": "Enable tool search tool for Anthropic models. When enabled, tools are dynamically discovered and loaded on-demand using natural language search, reducing context window usage when many tools are available.",
"github.copilot.config.anthropic.toolSearchTool.mode": "Controls how tool search works for Anthropic models. 'server' uses Anthropic's built-in regex-based tool search. 'client' uses local embeddings-based semantic search for more accurate tool discovery.",
"github.copilot.config.responsesApi.toolSearchTool.enabled": "Enable tool search for OpenAI Responses API models. When enabled, tools are dynamically discovered and loaded on-demand using embeddings-based search, reducing context window usage when many tools are available.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add github.copilot.config.responsesApi.toolSearchTool.mode to control whether we use the server side or client side tool search tool?

@Giuspepe
Copy link
Copy Markdown
Contributor

Giuspepe commented Mar 30, 2026

It finds the tool and calls it, but then displays an error message because of the namespace field
image

image

Here in the docs they just set namespace to name when the tool doesn't have a namespace

image

However we don't set a namespace at the moment:
image

}
}

// Build final tools array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to happen for every request? can we make this and toolsMap as static on this and use it for all toolserach enabled requests?

type: 'function',
strict: false,
parameters: (tool.function.parameters || {}) as Record<string, unknown>,
...(isDeferred ? { defer_loading: true } : {}),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developers.openai.com/api/docs/guides/tools-tool-search, Are we grouping this by type like below?:

For maximum token savings, we recommend grouping deferred functions into namespaces or MCP servers with clear, high-level descriptions that give the model a strong overview of what is contained within them, so it can effectively search and load only the relevant functions. As a best practice, aim to keep each namespace to fewer than 10 functions for better token efficiency and model performance.

{
"tools": [
{
"type": "namespace",
"name": "crm",
"description": "CRM tools for customer lookup and order management.",
"tools": [
{
"type": "function",
"name": "list_open_orders",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants