Skip to content

[DRAFT] Add cache manager and package cache routing.#1216

Open
Xaenalt wants to merge 4 commits into
python-wheel-build:mainfrom
Xaenalt:feat/cache-manager
Open

[DRAFT] Add cache manager and package cache routing.#1216
Xaenalt wants to merge 4 commits into
python-wheel-build:mainfrom
Xaenalt:feat/cache-manager

Conversation

@Xaenalt

@Xaenalt Xaenalt commented Jun 24, 2026

Copy link
Copy Markdown

Don't bother reviewing till I address at least the AI comments
(Will squash once we're ready, I wanted to split the original 2 in case we want to bisect)

Pull Request Description

What

This adds two features:

  • One is a cache manager, that allows you to specify multiple remote caches.
  • Routes built and cache hit packages to different directories based on if they come from a top level dep or not. This lets us split the accelerated packages into their specific variant collection, and all transitive deps into the shared collection. This does require the user to specify which is which, but wiring it up based on shared libraries felt harder.

Why

This solves two longstanding problems:

  • Separation of caches for shared deps and top level accelerated packages

  • Pulling from multiple remote caches

  • PR follows CONTRIBUTING.md guidelines

AI summaries below:

Cache Manager:

    feat(cache): add unified CacheManager subsystem with remote PEP 503 support
    
    Introduces a new centralized cache management layer that replaces the
    ad-hoc per-directory lookups with a structured, hierarchical system.
    
    Key components:
    - `CacheManager` with collection-based lookup across local and remote backends
    - `RemotePEP503Backend` with lazy per-package fetching and session-scoped index
    - `StoreRouter` for directing artifacts to the correct collection
    - Short-circuit optimization in `_phase_prepare_source` that skips source
      download, build env, and build dep resolution on cache hits
    - `update_wheel_mirror` called after remote cache downloads so wheels are
      indexed for subsequent build dependency resolution via the internal server
    - CLI commands (`fromager cache list/stats/verify/invalidate/gc`)
    - `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options

Cache Routing:

    feat(cache): route built wheels to variant or shared collection
    
    When the CacheManager is active and a non-cpu variant has top-level
    requirements, newly built wheels are routed to the appropriate
    collection directory:
    
    - Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/
    - Unlisted transitive dependencies → wheels-repo/downloads/ (shared default)
    
    This ensures the variant collection contains exactly the packages
    explicitly requested in the requirements file, while common deps
    discovered during dependency resolution stay in the shared collection
    for reuse across variants.
    
    Changes:
    - `_build_cache_manager` accepts `toplevel_reqs` and uses them as the
      variant package set for store routing
    - `StoreRouter` routes based on variant_packages (from requirements
      file), not just pre-built packages
    - `_phase_build` calls `store_wheel` after building (not for cached hits)
    - `LocalDirectoryBackend.store()` copies (not moves) to preserve the
      original in downloads/ for the internal wheel server

Xaenalt and others added 2 commits June 24, 2026 09:23
…upport

Introduces a new centralized cache management layer that replaces the
ad-hoc per-directory lookups with a structured, hierarchical system.

Key components:
- `CacheManager` with collection-based lookup across local and remote backends
- `RemotePEP503Backend` with lazy per-package fetching and session-scoped index
- `StoreRouter` for directing artifacts to the correct collection
- Short-circuit optimization in `_phase_prepare_source` that skips source
  download, build env, and build dep resolution on cache hits
- `update_wheel_mirror` called after remote cache downloads so wheels are
  indexed for subsequent build dependency resolution via the internal server
- CLI commands (`fromager cache list/stats/verify/invalidate/gc`)
- `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Sean Pryor <spryor@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
When the CacheManager is active and a non-cpu variant has top-level
requirements, newly built wheels are routed to the appropriate
collection directory:

- Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/
- Unlisted transitive dependencies → wheels-repo/downloads/ (shared default)

This ensures the variant collection contains exactly the packages
explicitly requested in the requirements file, while common deps
discovered during dependency resolution stay in the shared collection
for reuse across variants.

Changes:
- `_build_cache_manager` accepts `toplevel_reqs` and uses them as the
  variant package set for store routing
- `StoreRouter` routes based on variant_packages (from requirements
  file), not just pre-built packages
- `_phase_build` calls `store_wheel` after building (not for cached hits)
- `LocalDirectoryBackend.store()` copies (not moves) to preserve the
  original in downloads/ for the internal wheel server

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Sean Pryor <spryor@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Xaenalt Xaenalt requested a review from a team as a code owner June 24, 2026 13:53
@Xaenalt Xaenalt added the enhancement New feature or request label Jun 24, 2026
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a cache subsystem with typed cache keys, backends, routing, stats, and manager orchestration. WorkContext gains an optional cache manager, SourceType gains CACHED, and bootstrap now optionally builds and uses the manager for cached-wheel lookup and storing built wheels. A new cache CLI group adds list, stats, verify, invalidate, and gc subcommands. Tests cover cache primitives, backends, routing, manager behavior, CLI-related flows, and bootstrap integration.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a cache manager and cache routing.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description matches the changeset by describing the new cache manager, routing, and CLI additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@mergify mergify Bot added the ci label Jun 24, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
tests/test_cache.py (1)

1045-1103: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert same-run promotion to the local backend here.

This test proves the file was downloaded, but not that the CacheManager actually promoted it into local_backend for subsequent lookups. Adding a second lookup (or a direct local_backend.lookup(...) assertion) would catch regressions where remote hits never become local cache hits.

 Based on learnings and path instructions: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cache.py` around lines 1045 - 1103, The remote-hit test currently
only verifies the download result and does not confirm same-run promotion into
the local cache. Update test_remote_hit_downloads_to_local_store in CacheManager
flow to perform a second lookup or directly query local_backend after the first
remote hit, and assert that the item is now served from LocalDirectoryBackend as
a local hit. Use the existing symbols CacheManager.lookup_wheel, local_backend,
and RemotePEP503Backend to keep the test focused on promotion behavior.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1535-1541: The wheel mirror update is happening before the routed
cache entry is stored, which can leave the internal index out of sync for the
newly copied path. In `bootstrapper.py`, reorder the flow in the cache-hit
handling around `server.update_wheel_mirror`, `self.ctx.package_build_info`, and
`self.ctx.cache.store_wheel` so the wheel is stored in the routed collection
first and the mirror is updated afterward. Keep the existing routing logic
intact, just move the mirror refresh to run after `store_wheel()` completes.

In `@src/fromager/cache.py`:
- Around line 664-667: After backend.fetch in the cache hit path, register the
downloaded artifact with collection.store_backend so its index is updated as a
local artifact instead of remaining remote-only. Use the existing cache-hit flow
around backend.fetch and the collection.store_backend object to add the
downloaded local_path back into the store manager in the same way promoted
artifacts are tracked, so live cache/admin commands can see it.
- Around line 454-463: Verify and enforce the advertised sha256 when downloading
a remote wheel in fetch(). After the session.get/iter_content write completes,
compute the file’s SHA256 for the downloaded target and compare it against
info.sha256 from ArtifactInfo; if it does not match, reject the wheel by
removing the cached file and raising an error. Keep the existing download flow
in fetch() but add the hash validation before returning target so only verified
wheels are accepted.
- Around line 504-535: The _parse_project_page() helper is accepting untrusted
anchor text as ArtifactInfo.filename, which later gets joined by fetch() when
writing to dest, so reject any remote wheel name containing path components or
traversal segments. Update _parse_project_page() to validate the parsed filename
before appending ArtifactInfo, and only allow a plain basename for remote
artifacts while keeping the existing URL and sha256 parsing logic intact.

In `@src/fromager/commands/cache_cmd.py`:
- Around line 92-99: The variant cache setup in cache_cmd.py is reusing the same
prebuilt_backend instance in both the variant and default collections, which
breaks collection scoping and causes double counting/mutating shared files.
Update the CacheCollection construction logic around wkctx.variant so the
variant collection only owns variant-specific backends, while prebuilt_backend
remains attached to the default collection and is still reachable via
search_order. Keep the unique symbols CacheCollection, variant_backends,
collections[wkctx.variant], and search_order aligned so cache list/stats and
invalidate --collection <variant> operate on separate backend instances.

In `@tests/test_bootstrapper_iterative.py`:
- Around line 1319-1368: The cache hit stats test is bypassing the real lookup
path by mocking _find_cached_wheel_via_manager, so CacheManager.lookup_wheel()
never executes and the hit event cannot be verified. Update
test_cache_hit_records_stats to exercise the actual manager lookup path by only
stubbing the unpack step (for example, _unpack_metadata_from_wheel) and then
assert the relevant stats counter/event produced by CacheManager/Bootstrapper.
If this test is only duplicating phase progression, rename or remove it so it
validates the intended stats behavior.

---

Nitpick comments:
In `@tests/test_cache.py`:
- Around line 1045-1103: The remote-hit test currently only verifies the
download result and does not confirm same-run promotion into the local cache.
Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform
a second lookup or directly query local_backend after the first remote hit, and
assert that the item is now served from LocalDirectoryBackend as a local hit.
Use the existing symbols CacheManager.lookup_wheel, local_backend, and
RemotePEP503Backend to keep the test focused on promotion behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a423bd0f-a4ca-4a6b-892c-f98580939f94

📥 Commits

Reviewing files that changed from the base of the PR and between 694f04f and 28b92ad.

📒 Files selected for processing (9)
  • pyproject.toml
  • src/fromager/bootstrapper.py
  • src/fromager/cache.py
  • src/fromager/commands/bootstrap.py
  • src/fromager/commands/cache_cmd.py
  • src/fromager/context.py
  • src/fromager/requirements_file.py
  • tests/test_bootstrapper_iterative.py
  • tests/test_cache.py

Comment thread src/fromager/bootstrapper.py Outdated
Comment thread src/fromager/cache.py
Comment thread src/fromager/cache.py
Comment thread src/fromager/cache.py Outdated
Comment thread src/fromager/commands/cache_cmd.py Outdated
Comment thread tests/test_bootstrapper_iterative.py
- Verify SHA256 hash of remote wheel downloads, rejecting mismatches
- Reject filenames with path traversal components from remote indices
- Register remote cache hits in local store backend index after download
- Reorder mirror update to run after store_wheel in cache hit path
- Remove shared prebuilt_backend from variant collection (avoids double-counting)
- Fix stats test to exercise real CacheManager.lookup_wheel path

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Sean Pryor <spryor@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Xaenalt Xaenalt force-pushed the feat/cache-manager branch from 28b92ad to c2e2a8e Compare June 25, 2026 14:48

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/cache.py`:
- Line 599: The assignment in the cache setup is treating an explicit empty
variant_packages set as missing, so callers can unintentionally fall back to
accelerated_packages. Update the logic in the relevant cache initialization path
around self._variant_packages to distinguish None from an empty set, preserving
an explicitly provided empty variant_packages while still using
accelerated_packages only when variant_packages is not supplied.
- Around line 686-692: The cache lookup path in lookup_wheel/cache fetch
handling treats a fetch failure as fatal instead of falling through to the next
cache or build fallback. Update the logic around backend.fetch and the local
index store in cache.py so exceptions from unusable hits (HTTP errors, hash
mismatch, I/O failures) are caught and treated like a miss, allowing the loop
over backends to continue. Keep the successful-path registration with
collection.store_backend.store intact, but only after a confirmed fetch.
- Around line 366-369: The cache write path in store() currently skips copying
when a same-named artifact already exists, which can leave stale wheels on disk
while still treating the new build as stored. Update the logic around
Cache.store() and the dest check so existing entries are either verified before
reuse or overwritten with the newly built artifact before indexing, ensuring the
cache index always matches the actual file on disk.
- Around line 461-474: The download path in the cache handling logic should not
trust an existing wheel at target without validating it first, and it should not
leave partial files behind after a failed or interrupted write. Update the
cache/download flow around the existing target check and the write-and-hash
block in the cache module so that an existing file is SHA256-verified before
returning it, and any checksum mismatch or write failure removes the target
before raising. Use the existing download helper and the hasher/target logic to
keep validation and cleanup consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9a7ae0f7-3f32-4f7e-b0a2-36b4e2177ca6

📥 Commits

Reviewing files that changed from the base of the PR and between 28b92ad and c2e2a8e.

📒 Files selected for processing (6)
  • src/fromager/bootstrapper.py
  • src/fromager/cache.py
  • src/fromager/commands/bootstrap.py
  • src/fromager/commands/cache_cmd.py
  • tests/test_bootstrapper_iterative.py
  • tests/test_cache.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/fromager/commands/bootstrap.py
  • src/fromager/commands/cache_cmd.py
  • tests/test_cache.py
  • tests/test_bootstrapper_iterative.py
  • src/fromager/bootstrapper.py

Comment thread src/fromager/cache.py
Comment thread src/fromager/cache.py Outdated
Comment thread src/fromager/cache.py Outdated
Comment thread src/fromager/cache.py Outdated
- Use atomic tmp+replace for remote downloads to prevent partial files
- Verify SHA256 of existing cached files before reusing them
- Wrap fetch() in try/except so failures fall through to next backend
- Fix store() to overwrite stale same-named files using samefile check
- Fix StoreRouter treating empty variant_packages set as falsy

Co-Authored-By: Claude <claude@anthropic.com>
Signed-off-by: Sean Pryor <spryor@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant