feat(core): add PromptReplayCache to cache identical model responses#27650
feat(core): add PromptReplayCache to cache identical model responses#27650RockyOmvi wants to merge 3 commits into
Conversation
Add support for a JSON array format in trustedFolders.json as an alternative to the existing object format. When the file is a list, each entry defaults to TRUST_FOLDER trust level, making it easier to maintain a simple list of trusted directories manually. The object format continues to work unchanged for full control over trust levels (TRUST_FOLDER, TRUST_PARENT, DO_NOT_TRUST). Fixes google-gemini#27647
Introduces CachingContentGenerator as a decorator wrapping ContentGenerator. Cache key is SHA-256 of serialized (model, contents, config). Cache entries are stored at ~/.gemini/tmp/<projectId>/prompt-cache/<hash>.json with configurable TTL (default 3600s). Wired into createContentGenerator() before the LoggingContentGenerator wrapper so both streaming and non-streaming calls are cached. - CachingContentGenerator decorator (cachingContentGenerator.ts) - getProjectPromptCacheDir / safe variant in Storage - promptReplayCache config toggle in ConfigParameters - Integration in both API-key and OAuth code paths Fixes google-gemini#21570
|
📊 PR Size: size/L
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a local caching layer for model responses to optimize development workflows by avoiding redundant API calls. It also includes improvements to the trusted folders configuration logic, allowing for a more flexible list-based input format. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a prompt replay caching mechanism (CachingContentGenerator) to cache Gemini API responses (both standard and streaming) and updates the configuration and storage systems to support it. It also adds support for loading trusted folders from a JSON array format. The review feedback highlights critical issues in the caching implementation: a potential bug where incomplete/failed streams are cached, the blocking use of synchronous fs.existsSync in an asynchronous method, and a unit mismatch where the TTL configuration in seconds is passed directly to a constructor expecting milliseconds.
| private async *cachingStream( | ||
| stream: AsyncGenerator<GenerateContentResponse>, | ||
| key: string, | ||
| model: string, | ||
| ): AsyncGenerator<GenerateContentResponse> { | ||
| const collected: GenerateContentResponse[] = []; | ||
| try { | ||
| for await (const chunk of stream) { | ||
| collected.push(chunk); | ||
| yield chunk; | ||
| } | ||
| } finally { | ||
| if (collected.length > 0) { | ||
| await this.saveToCache(key, model, collected); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
If a streaming request fails, is aborted, or is cancelled before completion, the generator's finally block will still execute. Since collected.length > 0 is likely true for any chunks yielded before the failure, the partial/incomplete response will be saved to the cache. Subsequent requests will then hit the cache and return an incomplete stream without any error, leading to silent correctness bugs.
We should track whether the stream completed successfully and only save to the cache if it did.
private async *cachingStream(
stream: AsyncGenerator<GenerateContentResponse>,
key: string,
model: string,
): AsyncGenerator<GenerateContentResponse> {
const collected: GenerateContentResponse[] = [];
let completed = false;
try {
for await (const chunk of stream) {
collected.push(chunk);
yield chunk;
}
completed = true;
} finally {
if (completed && collected.length > 0) {
await this.saveToCache(key, model, collected);
}
}
}| private async loadFromCache( | ||
| key: string, | ||
| ): Promise<GenerateContentResponse[] | null> { | ||
| try { | ||
| const filePath = this.getCacheFilePath(key); | ||
| if (!fs.existsSync(filePath)) return null; | ||
| const raw = await fs.promises.readFile(filePath, 'utf-8'); | ||
| // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion | ||
| const entry = JSON.parse(raw) as CacheEntry; | ||
| const age = Date.now() - entry.createdAt; | ||
| if (age > entry.ttl) { | ||
| await fs.promises.unlink(filePath).catch(() => {}); | ||
| return null; | ||
| } | ||
| return entry.responses; | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
Using synchronous file system operations like fs.existsSync inside an asynchronous method blocks the Node.js event loop, which can degrade performance in concurrent environments. Instead, let fs.promises.readFile attempt to read the file directly and handle any missing file errors (ENOENT) gracefully in the catch block.
private async loadFromCache(
key: string,
): Promise<GenerateContentResponse[] | null> {
try {
const filePath = this.getCacheFilePath(key);
const raw = await fs.promises.readFile(filePath, 'utf-8');
// eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
const entry = JSON.parse(raw) as CacheEntry;
const age = Date.now() - entry.createdAt;
if (age > entry.ttl) {
await fs.promises.unlink(filePath).catch(() => {});
return null;
}
return entry.responses;
} catch {
return null;
}
}References
- Use asynchronous file system operations (e.g.,
fs.promises.readFile) instead of synchronous ones (e.g.,fs.readFileSync) to avoid blocking the event loop.
| if (gcConfig.promptReplayCacheEnabled) { | ||
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | ||
| if (cacheDir) { | ||
| inner = new CachingContentGenerator( | ||
| inner, | ||
| cacheDir, | ||
| gcConfig.promptReplayCacheTtl, | ||
| true, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.
| if (gcConfig.promptReplayCacheEnabled) { | |
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | |
| if (cacheDir) { | |
| inner = new CachingContentGenerator( | |
| inner, | |
| cacheDir, | |
| gcConfig.promptReplayCacheTtl, | |
| true, | |
| ); | |
| } | |
| } | |
| if (gcConfig.promptReplayCacheEnabled) { | |
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | |
| if (cacheDir) { | |
| inner = new CachingContentGenerator( | |
| inner, | |
| cacheDir, | |
| gcConfig.promptReplayCacheTtl * 1000, | |
| true, | |
| ); | |
| } | |
| } |
| if (gcConfig.promptReplayCacheEnabled) { | ||
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | ||
| if (cacheDir) { | ||
| models = new CachingContentGenerator( | ||
| models, | ||
| cacheDir, | ||
| gcConfig.promptReplayCacheTtl, | ||
| true, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.
| if (gcConfig.promptReplayCacheEnabled) { | |
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | |
| if (cacheDir) { | |
| models = new CachingContentGenerator( | |
| models, | |
| cacheDir, | |
| gcConfig.promptReplayCacheTtl, | |
| true, | |
| ); | |
| } | |
| } | |
| if (gcConfig.promptReplayCacheEnabled) { | |
| const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe(); | |
| if (cacheDir) { | |
| models = new CachingContentGenerator( | |
| models, | |
| cacheDir, | |
| gcConfig.promptReplayCacheTtl * 1000, | |
| true, | |
| ); | |
| } | |
| } |
- Convert TTL from seconds to milliseconds before passing to CachingContentGenerator - Remove synchronous fs.existsSync in favor of async readFile with error handling - Only cache stream results on successful completion, not on failure/abort - Add debug logging for cache hit/miss on both sync and stream paths - Add --prompt-replay-cache / --prompt-replay-cache-ttl CLI flags and settings schema entry - Fix paidTier return type and variable type issue in contentGenerator.ts - Fix test file to use LlmRole.MAIN enum instead of string literal
Summary
Adds a Prompt Replay Cache that stores model responses locally and reuses them when the same prompt is issued again within the same project. This reduces latency, API usage, and cost for repeated prompts during development workflows.
Fixes #21570
Changes
New:
CachingContentGenerator(packages/core/src/core/cachingContentGenerator.ts)ContentGeneratorinterface, wrapping the real API generator~/.gemini/tmp/<projectId>/prompt-cache/<hash>.jsongenerateContentandgenerateContentStreamStorage
getProjectPromptCacheDir()andgetProjectPromptCacheDirSafe()methodsConfig
promptReplayCache: { enabled?: boolean; ttl?: number }toConfigParameterspromptReplayCacheEnabledandpromptReplayCacheTtlon Config classIntegration
createContentGenerator()for both API-key and OAuth auth pathsLoggingContentGeneratorand the underlying API, so caching is transparent to telemetryTesting