Skip to content

feat(core): add PromptReplayCache to cache identical model responses#27650

Open
RockyOmvi wants to merge 3 commits into
google-gemini:mainfrom
RockyOmvi:feat/prompt-replay-cache
Open

feat(core): add PromptReplayCache to cache identical model responses#27650
RockyOmvi wants to merge 3 commits into
google-gemini:mainfrom
RockyOmvi:feat/prompt-replay-cache

Conversation

@RockyOmvi

Copy link
Copy Markdown

Summary

Adds a Prompt Replay Cache that stores model responses locally and reuses them when the same prompt is issued again within the same project. This reduces latency, API usage, and cost for repeated prompts during development workflows.

Fixes #21570

Changes

New: CachingContentGenerator (packages/core/src/core/cachingContentGenerator.ts)

  • Decorator implementing ContentGenerator interface, wrapping the real API generator
  • Cache key: SHA-256 hash of serialized (model, contents, config)
  • Cache storage: ~/.gemini/tmp/<projectId>/prompt-cache/<hash>.json
  • Configurable TTL (default: 3600s)
  • Supports both generateContent and generateContentStream
  • Cache write failures are non-fatal (silently ignored)
  • Expired entries are automatically cleaned up on lookup

Storage

  • Added getProjectPromptCacheDir() and getProjectPromptCacheDirSafe() methods

Config

  • Added promptReplayCache: { enabled?: boolean; ttl?: number } to ConfigParameters
  • Exposed as promptReplayCacheEnabled and promptReplayCacheTtl on Config class

Integration

  • Wired into createContentGenerator() for both API-key and OAuth auth paths
  • Cache sits between LoggingContentGenerator and the underlying API, so caching is transparent to telemetry

Testing

  • 5 new unit tests covering: cache hit, cache miss, disabled cache, TTL expiry, distinct cache keys
  • All 47 existing content generator tests pass
  • All 33 storage tests pass

RockyOmvi added 2 commits June 3, 2026 09:53
Add support for a JSON array format in trustedFolders.json as an
alternative to the existing object format. When the file is a list,
each entry defaults to TRUST_FOLDER trust level, making it easier to
maintain a simple list of trusted directories manually.

The object format continues to work unchanged for full control over
trust levels (TRUST_FOLDER, TRUST_PARENT, DO_NOT_TRUST).

Fixes google-gemini#27647
Introduces CachingContentGenerator as a decorator wrapping ContentGenerator.
Cache key is SHA-256 of serialized (model, contents, config). Cache entries
are stored at ~/.gemini/tmp/<projectId>/prompt-cache/<hash>.json with
configurable TTL (default 3600s). Wired into createContentGenerator()
before the LoggingContentGenerator wrapper so both streaming and
non-streaming calls are cached.

- CachingContentGenerator decorator (cachingContentGenerator.ts)
- getProjectPromptCacheDir / safe variant in Storage
- promptReplayCache config toggle in ConfigParameters
- Integration in both API-key and OAuth code paths

Fixes google-gemini#21570
@RockyOmvi RockyOmvi requested a review from a team as a code owner June 3, 2026 04:59
@github-actions github-actions Bot added the size/l A large sized PR label Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

📊 PR Size: size/L

  • Lines changed: 531
  • Additions: +517
  • Deletions: -14
  • Files changed: 10

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a local caching layer for model responses to optimize development workflows by avoiding redundant API calls. It also includes improvements to the trusted folders configuration logic, allowing for a more flexible list-based input format.

Highlights

  • Prompt Replay Cache: Introduced CachingContentGenerator to locally cache and reuse model responses, significantly reducing latency and API costs for repeated prompts.
  • Configuration: Added promptReplayCache settings to ConfigParameters to allow enabling and configuring the TTL for the new caching mechanism.
  • Trusted Folders Enhancement: Updated loadTrustedFolders to support a list-based JSON format for trusted folder configurations, improving flexibility for user settings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a prompt replay caching mechanism (CachingContentGenerator) to cache Gemini API responses (both standard and streaming) and updates the configuration and storage systems to support it. It also adds support for loading trusted folders from a JSON array format. The review feedback highlights critical issues in the caching implementation: a potential bug where incomplete/failed streams are cached, the blocking use of synchronous fs.existsSync in an asynchronous method, and a unit mismatch where the TTL configuration in seconds is passed directly to a constructor expecting milliseconds.

Comment on lines +163 to +179
private async *cachingStream(
stream: AsyncGenerator<GenerateContentResponse>,
key: string,
model: string,
): AsyncGenerator<GenerateContentResponse> {
const collected: GenerateContentResponse[] = [];
try {
for await (const chunk of stream) {
collected.push(chunk);
yield chunk;
}
} finally {
if (collected.length > 0) {
await this.saveToCache(key, model, collected);
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If a streaming request fails, is aborted, or is cancelled before completion, the generator's finally block will still execute. Since collected.length > 0 is likely true for any chunks yielded before the failure, the partial/incomplete response will be saved to the cache. Subsequent requests will then hit the cache and return an incomplete stream without any error, leading to silent correctness bugs.

We should track whether the stream completed successfully and only save to the cache if it did.

  private async *cachingStream(
    stream: AsyncGenerator<GenerateContentResponse>,
    key: string,
    model: string,
  ): AsyncGenerator<GenerateContentResponse> {
    const collected: GenerateContentResponse[] = [];
    let completed = false;
    try {
      for await (const chunk of stream) {
        collected.push(chunk);
        yield chunk;
      }
      completed = true;
    } finally {
      if (completed && collected.length > 0) {
        await this.saveToCache(key, model, collected);
      }
    }
  }

Comment on lines +71 to +89
private async loadFromCache(
key: string,
): Promise<GenerateContentResponse[] | null> {
try {
const filePath = this.getCacheFilePath(key);
if (!fs.existsSync(filePath)) return null;
const raw = await fs.promises.readFile(filePath, 'utf-8');
// eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
const entry = JSON.parse(raw) as CacheEntry;
const age = Date.now() - entry.createdAt;
if (age > entry.ttl) {
await fs.promises.unlink(filePath).catch(() => {});
return null;
}
return entry.responses;
} catch {
return null;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using synchronous file system operations like fs.existsSync inside an asynchronous method blocks the Node.js event loop, which can degrade performance in concurrent environments. Instead, let fs.promises.readFile attempt to read the file directly and handle any missing file errors (ENOENT) gracefully in the catch block.

  private async loadFromCache(
    key: string,
  ): Promise<GenerateContentResponse[] | null> {
    try {
      const filePath = this.getCacheFilePath(key);
      const raw = await fs.promises.readFile(filePath, 'utf-8');
      // eslint-disable-next-line @typescript-eslint/no-unsafe-type-assertion
      const entry = JSON.parse(raw) as CacheEntry;
      const age = Date.now() - entry.createdAt;
      if (age > entry.ttl) {
        await fs.promises.unlink(filePath).catch(() => {});
        return null;
      }
      return entry.responses;
    } catch {
      return null;
    }
  }
References
  1. Use asynchronous file system operations (e.g., fs.promises.readFile) instead of synchronous ones (e.g., fs.readFileSync) to avoid blocking the event loop.

Comment on lines +291 to +301
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
inner = new CachingContentGenerator(
inner,
cacheDir,
gcConfig.promptReplayCacheTtl,
true,
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.

Suggested change
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
inner = new CachingContentGenerator(
inner,
cacheDir,
gcConfig.promptReplayCacheTtl,
true,
);
}
}
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
inner = new CachingContentGenerator(
inner,
cacheDir,
gcConfig.promptReplayCacheTtl * 1000,
true,
);
}
}

Comment on lines +389 to +399
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
models = new CachingContentGenerator(
models,
cacheDir,
gcConfig.promptReplayCacheTtl,
true,
);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The configuration parameter promptReplayCacheTtl is defined in seconds (defaulting to 3600 seconds / 1 hour), but CachingContentGenerator expects the TTL parameter in milliseconds. Passing the raw value directly causes the cache to expire in 3.6 seconds instead of 1 hour. We should multiply the TTL by 1000 to convert it to milliseconds.

Suggested change
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
models = new CachingContentGenerator(
models,
cacheDir,
gcConfig.promptReplayCacheTtl,
true,
);
}
}
if (gcConfig.promptReplayCacheEnabled) {
const cacheDir = gcConfig.storage.getProjectPromptCacheDirSafe();
if (cacheDir) {
models = new CachingContentGenerator(
models,
cacheDir,
gcConfig.promptReplayCacheTtl * 1000,
true,
);
}
}

- Convert TTL from seconds to milliseconds before passing to CachingContentGenerator
- Remove synchronous fs.existsSync in favor of async readFile with error handling
- Only cache stream results on successful completion, not on failure/abort
- Add debug logging for cache hit/miss on both sync and stream paths
- Add --prompt-replay-cache / --prompt-replay-cache-ttl CLI flags and settings schema entry
- Fix paidTier return type and variable type issue in contentGenerator.ts
- Fix test file to use LlmRole.MAIN enum instead of string literal
@gemini-cli gemini-cli Bot added priority/p3 Backlog - a good idea but not currently a priority. area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! priority/p3 Backlog - a good idea but not currently a priority. size/l A large sized PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prompt Replay Cache to Reduce Redundant Model Calls

1 participant