feat(core): add PromptReplayCache to cache identical model responses by RockyOmvi · Pull Request #27650 · google-gemini/gemini-cli

RockyOmvi · 2026-06-03T04:59:58Z

Summary

Adds a Prompt Replay Cache that stores model responses locally and reuses them when the same prompt is issued again within the same project. This reduces latency, API usage, and cost for repeated prompts during development workflows.

Fixes #21570

Changes

New: `CachingContentGenerator` (packages/core/src/core/cachingContentGenerator.ts)

Decorator implementing ContentGenerator interface, wrapping the real API generator
Cache key: SHA-256 hash of serialized (model, contents, config)
Cache storage: ~/.gemini/tmp/<projectId>/prompt-cache/<hash>.json
Configurable TTL (default: 3600s)
Supports both generateContent and generateContentStream
Cache write failures are non-fatal (silently ignored)
Expired entries are automatically cleaned up on lookup

Storage

Added getProjectPromptCacheDir() and getProjectPromptCacheDirSafe() methods

Config

Added promptReplayCache: { enabled?: boolean; ttl?: number } to ConfigParameters
Exposed as promptReplayCacheEnabled and promptReplayCacheTtl on Config class

Integration

Wired into createContentGenerator() for both API-key and OAuth auth paths
Cache sits between LoggingContentGenerator and the underlying API, so caching is transparent to telemetry

Testing

5 new unit tests covering: cache hit, cache miss, disabled cache, TTL expiry, distinct cache keys
All 47 existing content generator tests pass
All 33 storage tests pass

Add support for a JSON array format in trustedFolders.json as an alternative to the existing object format. When the file is a list, each entry defaults to TRUST_FOLDER trust level, making it easier to maintain a simple list of trusted directories manually. The object format continues to work unchanged for full control over trust levels (TRUST_FOLDER, TRUST_PARENT, DO_NOT_TRUST). Fixes google-gemini#27647

Introduces CachingContentGenerator as a decorator wrapping ContentGenerator. Cache key is SHA-256 of serialized (model, contents, config). Cache entries are stored at ~/.gemini/tmp/<projectId>/prompt-cache/<hash>.json with configurable TTL (default 3600s). Wired into createContentGenerator() before the LoggingContentGenerator wrapper so both streaming and non-streaming calls are cached. - CachingContentGenerator decorator (cachingContentGenerator.ts) - getProjectPromptCacheDir / safe variant in Storage - promptReplayCache config toggle in ConfigParameters - Integration in both API-key and OAuth code paths Fixes google-gemini#21570

github-actions · 2026-06-03T05:00:10Z

📊 PR Size: size/L

Lines changed: 531
Additions: +517
Deletions: -14
Files changed: 10

gemini-code-assist · 2026-06-03T05:09:10Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a local caching layer for model responses to optimize development workflows by avoiding redundant API calls. It also includes improvements to the trusted folders configuration logic, allowing for a more flexible list-based input format.

Highlights

Prompt Replay Cache: Introduced CachingContentGenerator to locally cache and reuse model responses, significantly reducing latency and API costs for repeated prompts.
Configuration: Added promptReplayCache settings to ConfigParameters to allow enabling and configuring the TTL for the new caching mechanism.
Trusted Folders Enhancement: Updated loadTrustedFolders to support a list-based JSON format for trusted folder configurations, improving flexibility for user settings.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a prompt replay caching mechanism (CachingContentGenerator) to cache Gemini API responses (both standard and streaming) and updates the configuration and storage systems to support it. It also adds support for loading trusted folders from a JSON array format. The review feedback highlights critical issues in the caching implementation: a potential bug where incomplete/failed streams are cached, the blocking use of synchronous fs.existsSync in an asynchronous method, and a unit mismatch where the TTL configuration in seconds is passed directly to a constructor expecting milliseconds.

gemini-code-assist · 2026-06-03T05:11:32Z

+  private async *cachingStream(
+    stream: AsyncGenerator<GenerateContentResponse>,
+    key: string,
+    model: string,
+  ): AsyncGenerator<GenerateContentResponse> {
+    const collected: GenerateContentResponse[] = [];
+    try {
+      for await (const chunk of stream) {
+        collected.push(chunk);
+        yield chunk;
+      }
+    } finally {
+      if (collected.length > 0) {
+        await this.saveToCache(key, model, collected);
+      }
+    }
+  }


If a streaming request fails, is aborted, or is cancelled before completion, the generator's finally block will still execute. Since collected.length > 0 is likely true for any chunks yielded before the failure, the partial/incomplete response will be saved to the cache. Subsequent requests will then hit the cache and return an incomplete stream without any error, leading to silent correctness bugs.

We should track whether the stream completed successfully and only save to the cache if it did.

private async *cachingStream( stream: AsyncGenerator<GenerateContentResponse>, key: string, model: string, ): AsyncGenerator<GenerateContentResponse> { const collected: GenerateContentResponse[] = []; let completed = false; try { for await (const chunk of stream) { collected.push(chunk); yield chunk; } completed = true; } finally { if (completed && collected.length > 0) { await this.saveToCache(key, model, collected); } } }

gemini-code-assist · 2026-06-03T05:11:32Z

+  private async loadFromCache(
+    key: string,
+  ): Promise<GenerateContentResponse[] | null> {
+    try {
+      const filePath = this.getCacheFilePath(key);
+      if (!fs.existsSync(filePath)) return null;
+      const raw = await fs.promises.readFile(filePath, 'utf-8');
+      // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
+      const entry = JSON.parse(raw) as CacheEntry;
+      const age = Date.now() - entry.createdAt;
+      if (age > entry.ttl) {
+        await fs.promises.unlink(filePath).catch(() => {});
+        return null;
+      }
+      return entry.responses;
+    } catch {
+      return null;
+    }
+  }


Using synchronous file system operations like fs.existsSync inside an asynchronous method blocks the Node.js event loop, which can degrade performance in concurrent environments. Instead, let fs.promises.readFile attempt to read the file directly and handle any missing file errors (ENOENT) gracefully in the catch block.

private async loadFromCache( key: string, ): Promise<GenerateContentResponse[] | null> { try { const filePath = this.getCacheFilePath(key); const raw = await fs.promises.readFile(filePath, 'utf-8'); // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion const entry = JSON.parse(raw) as CacheEntry; const age = Date.now() - entry.createdAt; if (age > entry.ttl) { await fs.promises.unlink(filePath).catch(() => {}); return null; } return entry.responses; } catch { return null; } }

References

Use asynchronous file system operations (e.g., fs.promises.readFile) instead of synchronous ones (e.g., fs.readFileSync) to avoid blocking the event loop.

gemini-code-assist · 2026-06-03T05:11:32Z

+      if (gcConfig.promptReplayCacheEnabled) {
+        const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
+        if (cacheDir) {
+          inner = new CachingContentGenerator(
+            inner,
+            cacheDir,
+            gcConfig.promptReplayCacheTtl,
+            true,
+          );
+        }
+      }


The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.

Suggested change

if (gcConfig.promptReplayCacheEnabled) {

const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();

if (cacheDir) {

inner = new CachingContentGenerator(

inner,

cacheDir,

gcConfig.promptReplayCacheTtl,

true,

);

}

}

if (gcConfig.promptReplayCacheEnabled) {

const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();

if (cacheDir) {

inner = new CachingContentGenerator(

inner,

cacheDir,

gcConfig.promptReplayCacheTtl * 1000,

true,

);

}

}

gemini-code-assist · 2026-06-03T05:11:32Z

+      if (gcConfig.promptReplayCacheEnabled) {
+        const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
+        if (cacheDir) {
+          models = new CachingContentGenerator(
+            models,
+            cacheDir,
+            gcConfig.promptReplayCacheTtl,
+            true,
+          );
+        }
+      }


The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.

Suggested change

if (gcConfig.promptReplayCacheEnabled) {

const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();

if (cacheDir) {

models = new CachingContentGenerator(

models,

cacheDir,

gcConfig.promptReplayCacheTtl,

true,

);

}

}

if (gcConfig.promptReplayCacheEnabled) {

const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();

if (cacheDir) {

models = new CachingContentGenerator(

models,

cacheDir,

gcConfig.promptReplayCacheTtl * 1000,

true,

);

}

}

- Convert TTL from seconds to milliseconds before passing to CachingContentGenerator - Remove synchronous fs.existsSync in favor of async readFile with error handling - Only cache stream results on successful completion, not on failure/abort - Add debug logging for cache hit/miss on both sync and stream paths - Add --prompt-replay-cache / --prompt-replay-cache-ttl CLI flags and settings schema entry - Fix paidTier return type and variable type issue in contentGenerator.ts - Fix test file to use LlmRole.MAIN enum instead of string literal

RockyOmvi added 2 commits June 3, 2026 09:53

RockyOmvi requested a review from a team as a code owner June 3, 2026 04:59

github-actions Bot added the size/l A large sized PR label Jun 3, 2026

gemini-code-assist Bot reviewed Jun 3, 2026

View reviewed changes

gemini-cli Bot added priority/p3 Backlog - a good idea but not currently a priority. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add PromptReplayCache to cache identical model responses#27650

feat(core): add PromptReplayCache to cache identical model responses#27650
RockyOmvi wants to merge 3 commits into
google-gemini:mainfrom
RockyOmvi:feat/prompt-replay-cache

RockyOmvi commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RockyOmvi commented Jun 3, 2026

Summary

Changes

New: CachingContentGenerator (packages/core/src/core/cachingContentGenerator.ts)

Storage

Config

Integration

Testing

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Jun 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New: `CachingContentGenerator` (packages/core/src/core/cachingContentGenerator.ts)

github-actions Bot commented Jun 3, 2026 •

edited

Loading