[DRAFT] Add cache manager and package cache routing.#1216
Conversation
…upport Introduces a new centralized cache management layer that replaces the ad-hoc per-directory lookups with a structured, hierarchical system. Key components: - `CacheManager` with collection-based lookup across local and remote backends - `RemotePEP503Backend` with lazy per-package fetching and session-scoped index - `StoreRouter` for directing artifacts to the correct collection - Short-circuit optimization in `_phase_prepare_source` that skips source download, build env, and build dep resolution on cache hits - `update_wheel_mirror` called after remote cache downloads so wheels are indexed for subsequent build dependency resolution via the internal server - CLI commands (`fromager cache list/stats/verify/invalidate/gc`) - `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
When the CacheManager is active and a non-cpu variant has top-level requirements, newly built wheels are routed to the appropriate collection directory: - Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/ - Unlisted transitive dependencies → wheels-repo/downloads/ (shared default) This ensures the variant collection contains exactly the packages explicitly requested in the requirements file, while common deps discovered during dependency resolution stay in the shared collection for reuse across variants. Changes: - `_build_cache_manager` accepts `toplevel_reqs` and uses them as the variant package set for store routing - `StoreRouter` routes based on variant_packages (from requirements file), not just pre-built packages - `_phase_build` calls `store_wheel` after building (not for cached hits) - `LocalDirectoryBackend.store()` copies (not moves) to preserve the original in downloads/ for the internal wheel server Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
📝 WalkthroughWalkthroughThis PR adds a cache subsystem with typed cache keys, backends, routing, stats, and manager orchestration. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
tests/test_cache.py (1)
1045-1103: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert same-run promotion to the local backend here.
This test proves the file was downloaded, but not that the
CacheManageractually promoted it intolocal_backendfor subsequent lookups. Adding a second lookup (or a directlocal_backend.lookup(...)assertion) would catch regressions where remote hits never become local cache hits.Based on learnings and path instructions: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cache.py` around lines 1045 - 1103, The remote-hit test currently only verifies the download result and does not confirm same-run promotion into the local cache. Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform a second lookup or directly query local_backend after the first remote hit, and assert that the item is now served from LocalDirectoryBackend as a local hit. Use the existing symbols CacheManager.lookup_wheel, local_backend, and RemotePEP503Backend to keep the test focused on promotion behavior.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1535-1541: The wheel mirror update is happening before the routed
cache entry is stored, which can leave the internal index out of sync for the
newly copied path. In `bootstrapper.py`, reorder the flow in the cache-hit
handling around `server.update_wheel_mirror`, `self.ctx.package_build_info`, and
`self.ctx.cache.store_wheel` so the wheel is stored in the routed collection
first and the mirror is updated afterward. Keep the existing routing logic
intact, just move the mirror refresh to run after `store_wheel()` completes.
In `@src/fromager/cache.py`:
- Around line 664-667: After backend.fetch in the cache hit path, register the
downloaded artifact with collection.store_backend so its index is updated as a
local artifact instead of remaining remote-only. Use the existing cache-hit flow
around backend.fetch and the collection.store_backend object to add the
downloaded local_path back into the store manager in the same way promoted
artifacts are tracked, so live cache/admin commands can see it.
- Around line 454-463: Verify and enforce the advertised sha256 when downloading
a remote wheel in fetch(). After the session.get/iter_content write completes,
compute the file’s SHA256 for the downloaded target and compare it against
info.sha256 from ArtifactInfo; if it does not match, reject the wheel by
removing the cached file and raising an error. Keep the existing download flow
in fetch() but add the hash validation before returning target so only verified
wheels are accepted.
- Around line 504-535: The _parse_project_page() helper is accepting untrusted
anchor text as ArtifactInfo.filename, which later gets joined by fetch() when
writing to dest, so reject any remote wheel name containing path components or
traversal segments. Update _parse_project_page() to validate the parsed filename
before appending ArtifactInfo, and only allow a plain basename for remote
artifacts while keeping the existing URL and sha256 parsing logic intact.
In `@src/fromager/commands/cache_cmd.py`:
- Around line 92-99: The variant cache setup in cache_cmd.py is reusing the same
prebuilt_backend instance in both the variant and default collections, which
breaks collection scoping and causes double counting/mutating shared files.
Update the CacheCollection construction logic around wkctx.variant so the
variant collection only owns variant-specific backends, while prebuilt_backend
remains attached to the default collection and is still reachable via
search_order. Keep the unique symbols CacheCollection, variant_backends,
collections[wkctx.variant], and search_order aligned so cache list/stats and
invalidate --collection <variant> operate on separate backend instances.
In `@tests/test_bootstrapper_iterative.py`:
- Around line 1319-1368: The cache hit stats test is bypassing the real lookup
path by mocking _find_cached_wheel_via_manager, so CacheManager.lookup_wheel()
never executes and the hit event cannot be verified. Update
test_cache_hit_records_stats to exercise the actual manager lookup path by only
stubbing the unpack step (for example, _unpack_metadata_from_wheel) and then
assert the relevant stats counter/event produced by CacheManager/Bootstrapper.
If this test is only duplicating phase progression, rename or remove it so it
validates the intended stats behavior.
---
Nitpick comments:
In `@tests/test_cache.py`:
- Around line 1045-1103: The remote-hit test currently only verifies the
download result and does not confirm same-run promotion into the local cache.
Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform
a second lookup or directly query local_backend after the first remote hit, and
assert that the item is now served from LocalDirectoryBackend as a local hit.
Use the existing symbols CacheManager.lookup_wheel, local_backend, and
RemotePEP503Backend to keep the test focused on promotion behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a423bd0f-a4ca-4a6b-892c-f98580939f94
📒 Files selected for processing (9)
pyproject.tomlsrc/fromager/bootstrapper.pysrc/fromager/cache.pysrc/fromager/commands/bootstrap.pysrc/fromager/commands/cache_cmd.pysrc/fromager/context.pysrc/fromager/requirements_file.pytests/test_bootstrapper_iterative.pytests/test_cache.py
- Verify SHA256 hash of remote wheel downloads, rejecting mismatches - Reject filenames with path traversal components from remote indices - Register remote cache hits in local store backend index after download - Reorder mirror update to run after store_wheel in cache hit path - Remove shared prebuilt_backend from variant collection (avoids double-counting) - Fix stats test to exercise real CacheManager.lookup_wheel path Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
28b92ad to
c2e2a8e
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/fromager/cache.py`:
- Line 599: The assignment in the cache setup is treating an explicit empty
variant_packages set as missing, so callers can unintentionally fall back to
accelerated_packages. Update the logic in the relevant cache initialization path
around self._variant_packages to distinguish None from an empty set, preserving
an explicitly provided empty variant_packages while still using
accelerated_packages only when variant_packages is not supplied.
- Around line 686-692: The cache lookup path in lookup_wheel/cache fetch
handling treats a fetch failure as fatal instead of falling through to the next
cache or build fallback. Update the logic around backend.fetch and the local
index store in cache.py so exceptions from unusable hits (HTTP errors, hash
mismatch, I/O failures) are caught and treated like a miss, allowing the loop
over backends to continue. Keep the successful-path registration with
collection.store_backend.store intact, but only after a confirmed fetch.
- Around line 366-369: The cache write path in store() currently skips copying
when a same-named artifact already exists, which can leave stale wheels on disk
while still treating the new build as stored. Update the logic around
Cache.store() and the dest check so existing entries are either verified before
reuse or overwritten with the newly built artifact before indexing, ensuring the
cache index always matches the actual file on disk.
- Around line 461-474: The download path in the cache handling logic should not
trust an existing wheel at target without validating it first, and it should not
leave partial files behind after a failed or interrupted write. Update the
cache/download flow around the existing target check and the write-and-hash
block in the cache module so that an existing file is SHA256-verified before
returning it, and any checksum mismatch or write failure removes the target
before raising. Use the existing download helper and the hasher/target logic to
keep validation and cleanup consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9a7ae0f7-3f32-4f7e-b0a2-36b4e2177ca6
📒 Files selected for processing (6)
src/fromager/bootstrapper.pysrc/fromager/cache.pysrc/fromager/commands/bootstrap.pysrc/fromager/commands/cache_cmd.pytests/test_bootstrapper_iterative.pytests/test_cache.py
🚧 Files skipped from review as they are similar to previous changes (5)
- src/fromager/commands/bootstrap.py
- src/fromager/commands/cache_cmd.py
- tests/test_cache.py
- tests/test_bootstrapper_iterative.py
- src/fromager/bootstrapper.py
- Use atomic tmp+replace for remote downloads to prevent partial files - Verify SHA256 of existing cached files before reusing them - Wrap fetch() in try/except so failures fall through to next backend - Fix store() to overwrite stale same-named files using samefile check - Fix StoreRouter treating empty variant_packages set as falsy Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Don't bother reviewing till I address at least the AI comments
(Will squash once we're ready, I wanted to split the original 2 in case we want to bisect)
Pull Request Description
What
This adds two features:
Why
This solves two longstanding problems:
Separation of caches for shared deps and top level accelerated packages
Pulling from multiple remote caches
PR follows CONTRIBUTING.md guidelines
AI summaries below:
Cache Manager:
Cache Routing: