fix(node): close under-withholding via full ref scope and reachable-only pin set#46
fix(node): close under-withholding via full ref scope and reachable-only pin set#46beardthelion wants to merge 65 commits into
Conversation
…al clone upload_pack_excluding emitted a v2 packfile section, but info_refs advertises v0, so real clients negotiated v0 and rejected the response with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape instead (NAK, then the pack via side-band-64k when offered). Add an end-to-end test that stands up info_refs + upload_pack_excluding and runs a real git partial clone, asserting the withheld blob's bytes never reach the client while its tree entry and SHA stay visible. A stock full clone cannot consume the pack (it is not closed under reachability, so fetch fails the connectivity check); a partial clone is required.
…tion choice Add a real-git test that partial-clones, pushes a new commit server-side, then fetches: the new object arrives and the withheld blob stays absent. This pins down that ignoring have/want negotiation (always sending a self-contained pack of all refs minus withheld, with NAK) is correct for both clone and fetch; the only cost is a fetch re-sends the full object set. Refactor the real-git tests onto a shared server harness and document the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off the async worker thread, matching the tokio::process / spawn_blocking usage already in this file: build_filtered_pack (rev-list + pack-objects) and withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large repo cannot stall the tokio runtime. Behavior is unchanged. Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2 packfile framing, which is the exact path that failed against a real client and was corrected to v0. The block now documents the shipped v0 contract. Drop a stray trailing code fence flagged by markdownlint (MD040). The speculative ls-tree timeout and the public/no-rules fast-path from the review are intentionally left out: the timeout guards against adversarial repos we do not yet host, and the fast-path is a micro-optimization not worth the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The Phase 3 plan was scaffolding for this change, not something the project needs to carry. Removing it leaves only the code and tests in the PR.
Three fixes from the PR Gitlawb#33 review: - withheld_paths now applies the whole-repo "/" read gate (returns repo-not-found when the caller cannot read the root), matching the git read endpoints. Without it the endpoint disclosed a private repo's existence and path layout to unauthorized callers. The withheld_globs doc already assumed this gate existed; now it does. - A nested allow under a denied parent (e.g. "/secret/public/**" allowed, "/secret/**" denied) was over-withheld: the client sparse-excluded the whole parent and hid paths the caller may read. The endpoint now also returns a "reinclude" list (allowed globs strictly under a denied one) and gl clone re-includes them in the sparse spec after the excludes. - Wildcard-free globs like "/docs/private" match both the exact path and a subtree (per glob_matches), but the client only emitted the subtree exclude. sparse_patterns now emits both "/docs/private" and "/docs/private/". Verified the exclude-then-reinclude sparse ordering checks out cleanly with real git, plus unit tests for reincluded_globs, the nested re-include, the exact-path exclude, and sparse_patterns.
…eld-path errors
split_once('/') accepted owner/name/extra, smuggling a path segment into
the repo name that then flowed into the API path and clone URL; reject it.
fetch_withheld swallowed every network/auth/5xx/JSON error into an empty
result, dropping to a stock clone that the node refuses once blobs are
withheld. Now only 404/501 (endpoint unsupported) fall back to empty; the
rest propagate so the real cause surfaces.
The receive-pack replication chokepoint called withheld_blob_oids directly on the tokio worker, where its blocking git ls-tree walk can stall the runtime for repos with many refs. Wrap it in spawn_blocking to match the upload-pack serve path.
Add a regression test for deny /secret, allow /secret/public, deny /secret/public/admin and clarify the sparse-checkout comment. git does not re-traverse an explicitly excluded directory, so emitting all excludes before re-includes keeps the deepest deny in force; the broader parent re-include does not resurrect it.
…hema Read the default branch from the local origin/HEAD symref clone already set, instead of parsing the localized, network-dependent output of git remote show origin. Deserialize the withheld-paths body into a typed struct so a missing or mistyped withheld/reinclude field is a hard error rather than silently becoming an empty list, which would mask a server regression behind a later clone failure.
…or present-check Add two hermetic integration tests for recover_from_arweave that drive the full read path over mocked Arweave GraphQL + IPFS gateways: discover the manifest, fetch it, fetch the envelope, decrypt, and install the withheld blob. One covers an authorized recipient (blob installed), the other a non-recipient (nothing recovered). Both simulate origin death by removing the promisor remote and enable uploadpack.allowFilter so the blob is truly withheld over file://. Also harden the local presence check in recover_from_arweave with GIT_NO_LAZY_FETCH=1 and .output() so the expected 'missing object' case does not trigger a wasted promisor fetch or leak git stderr to the user.
db: parse_recipients now propagates a descriptive error instead of defaulting corrupt recipients JSON to an empty list, which would have denied authorized readers and handed peers incomplete metadata. gl: clone recovery now warns when the sparse-checkout file cannot be read or written, or when the post-recovery checkout fails, instead of silently discarding those errors and claiming files were recovered.
Envelope v2 drops the cleartext recipient public key (kid) from each wrapped-key header entry, so a party holding the public IPFS/Arweave copy can no longer enumerate who is authorized to decrypt a withheld blob. The version is bumped to 2 and v1 envelopes are rejected. open_blob now selects the reader's entry by trial decryption (the AEAD tag authenticates exactly one entry) instead of matching on the public key. Recipient count is still visible; identity is not. Scope is envelope-only, the node DB recipients column and peer replication metadata are unchanged. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
open_blob fed attacker-controlled envelopes to GenericArray::from_slice and XNonce::from_slice, which panic on a wrong-length input. Validate both the per-recipient box nonce and the body nonce to 24 bytes before use: skip a recipient entry whose nonce is malformed, and return an error for a malformed body nonce, so the public recovery path surfaces an error rather than panicking.
The encrypted-blobs/replicate endpoint shipped every blob's full recipient
DID list to mirrors, which persisted it, so any mirroring peer learned the
reader set. The v2 envelope blinding already removed recipient public keys
from the pinned envelopes, so the comment justifying this (DIDs are already
public) was no longer true.
/replicate now returns {oid, cid} only. Mirrors detect a re-seal by the CID
changing (the OID is stable across re-seals) instead of comparing recipient
sets, and store no recipient identities. Origin-side authz and the at-rest
recipients column are unchanged; this blinds only the peer-facing surface.
The B3 encrypted-blob manifest anchored {oid, cid, recipients} to Arweave, a
permanent public record of every blob's reader set. The v2 envelope blinding
already removed recipient keys from the pinned envelopes, so the comment
justifying this (recipient DIDs are already public) was no longer true. The
manifest now anchors {oid, cid} only. The gl reader already ignores the
recipients field (it recovers by trial decryption), so no reader changes.
…adability
The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.
A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
…rror
Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
check, so a non-readable repo's blob index was reachable by an unauthenticated
caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
discovery and fetch use. For the intended case (a public repo with withheld
subtrees) the public root keeps this open to peers; only fully non-readable
repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
resealed, causing avoidable IPFS writes during a partial outage; skip and retry
on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
…nly pin set blob_paths walked only refs/heads/* and refs/tags/* and skipped silently on a failed git ls-tree, so a blob reachable only through another namespace, or a ref that failed to traverse, could fall out of the withheld set and ship in cleartext. Walk every ref and fail closed on traversal error. The pin enumerators (ipfs_pin, pinata) used git cat-file --batch-all-objects, which includes unreachable/dangling objects that have no path and cannot be classified for withholding. Switch them to git rev-list --objects --all so the pin set matches the reachable graph blob_paths evaluates.
📝 WalkthroughWalkthroughThis PR implements end-to-end encryption and withholding for Phase 3 confidentiality: withheld blobs are sealed to recipient keys via ephemeral X25519 envelopes, encrypted with XChaCha20Poly1305, pinned to IPFS with opaque recipient tags, anchored to Arweave, replicated across mirrors using promisor-based filtering, and recovered locally via a new ChangesCore encryption and visibility foundations
Server-side filtered pack serving and encrypted blob management
Mirror mode classification and encrypted blob replication
Client-side clone command with recovery paths
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (4)
crates/gl/src/clone.rs (1)
262-267: ⚡ Quick winParse
/encrypted-blobsinto a typed response instead ofValue.A missing or mistyped
blobsfield currently degrades to[]and silently disables node-side recovery. Making this schema-strict would catch server/client contract drift the same wayWithheldPathsResponsealready does.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gl/src/clone.rs` around lines 262 - 267, Replace the loose serde_json::Value parsing for resp in clone.rs with a strongly typed EncryptedBlobsResponse struct so schema mismatches surface as errors: define EncryptedBlobsResponse { blobs: Vec<EncryptedBlobOrString> } (matching the server shape, analogous to WithheldPathsResponse), then call resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")? and use the returned .blobs directly instead of body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing or mistyped "blobs" no longer silently becomes [] and will fail fast (update the EncryptedBlobOrString type to the actual blob element type used).crates/gitlawb-node/src/api/encrypted.rs (1)
109-123: ⚡ Quick winAdd a handler test for the 404/404 access-control contract.
The new test only locks the wire shape. A small handler-level test that exercises both “caller denied” and “OID missing” on
get_encrypted_blobwould protect the deliberate non-oracle behavior from future refactors.Based on learnings, this endpoint should keep “not found” and “not authorized” externally indistinguishable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/encrypted.rs` around lines 109 - 123, Add a handler-level test for the access-control contract of get_encrypted_blob that verifies the external response is indistinguishable when the caller is denied versus when the OID is missing: write two tests (or one parameterized test) invoking the get_encrypted_blob handler — one scenario where the auth check fails (simulate "caller denied") and one where the blob store returns NotFound (simulate "OID missing") — and assert both responses have the same HTTP status and body (e.g., both return 404/not-found) to ensure the endpoint does not leak authorization state; reference the get_encrypted_blob handler in your test and reuse existing test scaffolding in the encrypted.rs tests module.Source: Learnings
crates/gitlawb-core/src/encrypt.rs (1)
66-88: ⚡ Quick winZeroize transient key material in
seal_blobandopen_blob.
content_key, the seed-derived X25519 secret, and the unwrapped content key all live in plain stack/heap buffers until drop. In this confidentiality path that is avoidable exposure; keep them inZeroizing(or explicitly callzeroize()) once the ciphers are initialized.Hardening sketch
+use zeroize::Zeroizing; + - let mut content_key = [0u8; 32]; - OsRng.fill_bytes(&mut content_key); + let mut content_key = Zeroizing::new([0u8; 32]); + OsRng.fill_bytes(&mut *content_key); let body_cipher = XChaCha20Poly1305::new_from_slice(&content_key) .map_err(|e| anyhow::anyhow!("content key: {e}"))?; ... - let my_x = XSecret::from(x25519_secret_from_seed(&keypair.seed_bytes())); + let seed = keypair.to_seed(); + let my_x = XSecret::from(x25519_secret_from_seed(&seed)); ... - let content_key = content_key.context("not a recipient of this envelope")?; + let content_key = Zeroizing::new(content_key.context("not a recipient of this envelope")?);Also applies to: 126-167
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-core/src/encrypt.rs` around lines 66 - 88, In seal_blob and open_blob, sensitive key material (the 32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519 secret, and any unwrapped content key buffers) must be zeroed after use: wrap stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>, Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize immediately after the ciphers are initialized and after encryption/decryption completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call its zeroize method), the seed-derived X25519 secret produced by x25519_private/related variable, and the unwrapped content key before returning or pushing into Recipient so no plaintext key remains in memory.crates/gitlawb-node/src/git/visibility_pack.rs (1)
101-130: 🏗️ Heavy liftAvoid re-traversing the repo to derive recipients.
withheld_blob_recipients()does one fullblob_paths()sweep throughwithheld_blob_oids()and then a second fullblob_paths()sweep for the recipient pass. On the push path,git_receive_pack()has already computed the withheld set before it gets here, so repos with visibility rules now pay up to three completels-treetraversals per push. Threading the precomputed withheld/blob-path data into this API would keep recipient derivation single-pass.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/visibility_pack.rs` around lines 101 - 130, withheld_blob_recipients currently re-traverses the repo by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>> representing withheld oids or oid->path map) so callers like git_receive_pack can pass the already-computed set/map, skip the internal withheld_blob_oids call when provided, and iterate the provided oid->path mapping (or filter blob_paths once) to build recipients; update callers (notably git_receive_pack) to supply the precomputed data and keep behavior identical when None is passed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The new seed_bytes() method leaks a non-zeroizing copy of
the Ed25519 seed; replace its usage with the existing to_seed() which returns a
Zeroizing<[u8;32]>, or change seed_bytes() to return a zeroizing wrapper (e.g.
Zeroizing<[u8;32]>) instead of plain [u8;32]; update callers such as the code
paths that call seed_bytes() (including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 398-425: Short-circuit the expensive withheld walk by detecting
when the repo's rules cannot withhold any path-scoped blobs (e.g., rules is
empty or only applies to root) before calling
visibility_pack::withheld_blob_oids; if that check shows no possible withheld
blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.
In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 77-80: Replace silent `continue` branches when
`crate::git::store::read_object(repo_path, oid)` and the IPFS pin operation fail
with warning logs that include contextual details (repo_path, oid, and the
error/result) so failures are observable; in the match arms around `let data =
match crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent
IPFS pin match, call the module’s logger (e.g., process_logger.warn or
log::warn) with a concise message and the error/result, then continue as before
to preserve behavior.
- Around line 72-76: The current code builds `keys: Vec<VerifyingKey>` by
`filter_map(|d| did_to_key(d))`, which silently drops unresolved DIDs and
proceeds to encrypt to a partial recipient set; change this so any unresolved
DID causes the blob to be skipped: instead of `filter_map`, resolve all DIDs and
detect failures (e.g., collect Options or compare `keys.len()` to `dids.len()`
after mapping via `did_to_key`), and when a mismatch is found log a warning
referencing `oid` (similar to the existing log) and `continue` so you do not
seal to a partial set; ensure you keep the `VerifyingKey` handling consistent
with downstream code that expects `keys`.
In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-85: The cat function creates a reqwest::Client with no timeout
(reqwest::Client::new()) which can hang get_encrypted_blob and the sync worker;
modify cat (or callers) to use a client with a bounded timeout (e.g., build a
Client via reqwest::Client::builder().timeout(Duration::from_secs(...)) and
reuse it) or apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 181-188: The code currently treats a None result from
fetch_withheld as MirrorMode::Plain which incorrectly downgrades unknown/error
states; instead, check the Option returned by fetch_withheld and only set
MirrorMode::Plain when you get Some(empty); if fetch_withheld returns None
preserve the repository's existing promisor/partialclone state (do not call
classify_mirror on None). Concretely, change the branch around
fetch_withheld/classify_mirror to: if withheld.is_some() { let mode =
classify_mirror(withheld); } else { read the current repo promisor/partialclone
setting from the local repo at local_path (or otherwise keep the current
MirrorMode) and use that mode for fetch_repo/clone_repo }, so
fetch_repo/clone_repo are invoked with the preserved mode rather than
unconditionally using MirrorMode::Plain.
In `@crates/gl/src/clone.rs`:
- Around line 428-459: The current GraphQL fetch (the local `query` +
`gql_body`, use of `client.post(...).json(&gql_body).send().await`, and
subsequent `parse_tx_ids(&gql)`) only requests first:100 and stops, causing
missing manifests; change the logic to paginate: rewrite the GraphQL query to
take variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
- Around line 415-416: The slug normalization and Arweave discovery/parsing need
tightening: ensure owner short-form uses the same split logic as the node by
keeping owner_short = owner.split(':').next_back().unwrap_or(owner) and forming
slug = format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.
---
Nitpick comments:
In `@crates/gitlawb-core/src/encrypt.rs`:
- Around line 66-88: In seal_blob and open_blob, sensitive key material (the
32-byte content_key, the ephemeral XSecret `eph`, the seed-derived X25519
secret, and any unwrapped content key buffers) must be zeroed after use: wrap
stack/heap buffers in zeroize::Zeroizing (e.g., Zeroizing<[u8;32]>,
Zeroizing<Vec<u8>>) or explicitly call zeroize() on types that implement Zeroize
immediately after the ciphers are initialized and after encryption/decryption
completes; ensure you zero `content_key`, the ephemeral secret `eph` (or call
its zeroize method), the seed-derived X25519 secret produced by
x25519_private/related variable, and the unwrapped content key before returning
or pushing into Recipient so no plaintext key remains in memory.
In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 109-123: Add a handler-level test for the access-control contract
of get_encrypted_blob that verifies the external response is indistinguishable
when the caller is denied versus when the OID is missing: write two tests (or
one parameterized test) invoking the get_encrypted_blob handler — one scenario
where the auth check fails (simulate "caller denied") and one where the blob
store returns NotFound (simulate "OID missing") — and assert both responses have
the same HTTP status and body (e.g., both return 404/not-found) to ensure the
endpoint does not leak authorization state; reference the get_encrypted_blob
handler in your test and reuse existing test scaffolding in the encrypted.rs
tests module.
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 101-130: withheld_blob_recipients currently re-traverses the repo
by calling withheld_blob_oids(...) and then blob_paths(...), causing multiple
ls-tree sweeps; change withheld_blob_recipients signature to accept precomputed
withheld data (e.g., Option<&HashSet<String>> or Option<&HashMap<String,String>>
representing withheld oids or oid->path map) so callers like git_receive_pack
can pass the already-computed set/map, skip the internal withheld_blob_oids call
when provided, and iterate the provided oid->path mapping (or filter blob_paths
once) to build recipients; update callers (notably git_receive_pack) to supply
the precomputed data and keep behavior identical when None is passed.
In `@crates/gl/src/clone.rs`:
- Around line 262-267: Replace the loose serde_json::Value parsing for resp in
clone.rs with a strongly typed EncryptedBlobsResponse struct so schema
mismatches surface as errors: define EncryptedBlobsResponse { blobs:
Vec<EncryptedBlobOrString> } (matching the server shape, analogous to
WithheldPathsResponse), then call
resp.json::<EncryptedBlobsResponse>().await.context("parsing encrypted-blobs")?
and use the returned .blobs directly instead of
body.get(...).and_then(...).cloned().unwrap_or_default(); this ensures missing
or mistyped "blobs" no longer silently becomes [] and will fail fast (update the
EncryptedBlobOrString type to the actual blob element type used).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fab81d81-dc64-46fe-b486-709131c8a108
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
crates/gitlawb-core/Cargo.tomlcrates/gitlawb-core/src/encrypt.rscrates/gitlawb-core/src/identity.rscrates/gitlawb-core/src/lib.rscrates/gitlawb-node/Cargo.tomlcrates/gitlawb-node/src/api/encrypted.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/arweave.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/encrypted_pin.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/main.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/sync.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
| /// The raw 32-byte Ed25519 seed. Used to derive the X25519 secret for | ||
| /// envelope decryption (see `crate::encrypt`). | ||
| pub fn seed_bytes(&self) -> [u8; 32] { | ||
| self.signing_key.to_bytes() | ||
| } |
There was a problem hiding this comment.
Avoid adding a non-zeroizing raw-seed accessor.
to_seed() already exposes this material behind Zeroizing; seed_bytes() reintroduces the same long-lived signing seed as a plain [u8; 32]. That copy now flows into open_blob and the async encrypted-pin path, which widens private-key lifetime without adding any new capability. Prefer reusing to_seed() here, or make this helper return a zeroizing wrapper instead of raw bytes. This already shows up in crates/gitlawb-node/src/api/repos.rs:659-707, where the seed is moved into the spawned encrypted-pin task.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, The new
seed_bytes() method leaks a non-zeroizing copy of the Ed25519 seed; replace its
usage with the existing to_seed() which returns a Zeroizing<[u8;32]>, or change
seed_bytes() to return a zeroizing wrapper (e.g. Zeroizing<[u8;32]>) instead of
plain [u8;32]; update callers such as the code paths that call seed_bytes()
(including open_blob and the encrypted-pin task in
crates/gitlawb-node/src/api/repos.rs) to accept and use the Zeroizing seed or to
call to_seed() directly so the raw seed is never held in plain memory.
| // withheld_blob_oids walks every ref with blocking `git ls-tree`; keep that | ||
| // off the async worker thread. | ||
| let withheld = { | ||
| let path = disk_path.clone(); | ||
| let rules = rules.clone(); | ||
| let owner_did = record.owner_did.clone(); | ||
| let caller_owned = caller.map(str::to_string); | ||
| let is_public = record.is_public; | ||
| tokio::task::spawn_blocking(move || { | ||
| visibility_pack::withheld_blob_oids( | ||
| &path, | ||
| &rules, | ||
| is_public, | ||
| &owner_did, | ||
| caller_owned.as_deref(), | ||
| ) | ||
| }) | ||
| .await | ||
| .map_err(|e| { | ||
| let msg = e.to_string(); | ||
| if msg.contains("bad line length") || msg.contains("protocol error") { | ||
| tracing::warn!(repo = %name, err = %msg, "git-upload-pack: bad client request"); | ||
| AppError::BadRequest(msg) | ||
| } else { | ||
| tracing::error!(repo = %name, err = %msg, "git-upload-pack failed"); | ||
| AppError::Git(msg) | ||
| } | ||
| })?; | ||
| .map_err(|e| AppError::Git(e.to_string()))? | ||
| .map_err(|e| AppError::Git(e.to_string()))? | ||
| }; | ||
|
|
||
| let resp = if withheld.is_empty() { | ||
| smart_http::upload_pack(&disk_path, body).await | ||
| } else { | ||
| tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack"); | ||
| smart_http::upload_pack_excluding(&disk_path, body, &withheld).await | ||
| } |
There was a problem hiding this comment.
Skip the withheld walk when no path-scoped rule can withhold anything.
After the "/" gate above, repos whose rule set is empty or root-only cannot produce a non-empty withheld blob set, but this branch still runs the full withheld_blob_oids() ref walk before falling back to the ordinary upload_pack(). That adds an extra full repo scan to every clone/fetch on the common public path.
💡 Minimal short-circuit
- let withheld = {
- let path = disk_path.clone();
- let rules = rules.clone();
- let owner_did = record.owner_did.clone();
- let caller_owned = caller.map(str::to_string);
- let is_public = record.is_public;
- tokio::task::spawn_blocking(move || {
- visibility_pack::withheld_blob_oids(
- &path,
- &rules,
- is_public,
- &owner_did,
- caller_owned.as_deref(),
- )
- })
- .await
- .map_err(|e| AppError::Git(e.to_string()))?
- .map_err(|e| AppError::Git(e.to_string()))?
- };
-
- let resp = if withheld.is_empty() {
+ let has_path_scoped_rules = rules.iter().any(|r| r.path_glob != "/");
+ let resp = if !has_path_scoped_rules {
smart_http::upload_pack(&disk_path, body).await
} else {
- tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack");
- smart_http::upload_pack_excluding(&disk_path, body, &withheld).await
+ let withheld = {
+ let path = disk_path.clone();
+ let rules = rules.clone();
+ let owner_did = record.owner_did.clone();
+ let caller_owned = caller.map(str::to_string);
+ let is_public = record.is_public;
+ tokio::task::spawn_blocking(move || {
+ visibility_pack::withheld_blob_oids(
+ &path,
+ &rules,
+ is_public,
+ &owner_did,
+ caller_owned.as_deref(),
+ )
+ })
+ .await
+ .map_err(|e| AppError::Git(e.to_string()))?
+ .map_err(|e| AppError::Git(e.to_string()))?
+ };
+ if withheld.is_empty() {
+ smart_http::upload_pack(&disk_path, body).await
+ } else {
+ tracing::info!(repo = %name, caller = ?caller, withheld = withheld.len(), "serving filtered pack");
+ smart_http::upload_pack_excluding(&disk_path, body, &withheld).await
+ }
}
.map_err(|e| {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/api/repos.rs` around lines 398 - 425, Short-circuit
the expensive withheld walk by detecting when the repo's rules cannot withhold
any path-scoped blobs (e.g., rules is empty or only applies to root) before
calling visibility_pack::withheld_blob_oids; if that check shows no possible
withheld blobs, skip the tokio::task::spawn_blocking(...) and call
smart_http::upload_pack(&disk_path, body).await directly, otherwise proceed to
spawn_blocking and then choose between smart_http::upload_pack and
smart_http::upload_pack_excluding based on the resulting withheld set; use the
existing variables and functions (rules, visibility_pack::withheld_blob_oids,
withheld, smart_http::upload_pack, smart_http::upload_pack_excluding) to
implement the early return.
| let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect(); | ||
| if keys.is_empty() { | ||
| tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin"); | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Do not encrypt to a partial recipient set when some DIDs fail key resolution.
At Line 72, filter_map drops unresolved DIDs and still seals for the remaining keys. That can make authorized readers unrecoverable even though the blob is recorded as successfully encrypted/pinned. Treat any unresolved DID as a per-blob failure and skip instead of partially sealing.
💡 Suggested fix
- let keys: Vec<VerifyingKey> = dids.iter().filter_map(|d| did_to_key(d)).collect();
- if keys.is_empty() {
- tracing::warn!(oid = %oid, "no resolvable recipient keys; skipping encrypted pin");
+ let mut keys = Vec::with_capacity(dids.len());
+ let mut unresolved = Vec::new();
+ for did in dids {
+ match did_to_key(did) {
+ Some(k) => keys.push(k),
+ None => unresolved.push(did.clone()),
+ }
+ }
+ if !unresolved.is_empty() {
+ tracing::warn!(oid = %oid, unresolved = ?unresolved, "some recipient DIDs could not be resolved; skipping encrypted pin");
+ continue;
+ }
+ if keys.is_empty() {
+ tracing::warn!(oid = %oid, "empty recipient set; skipping encrypted pin");
continue;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 72 - 76, The current
code builds `keys: Vec<VerifyingKey>` by `filter_map(|d| did_to_key(d))`, which
silently drops unresolved DIDs and proceeds to encrypt to a partial recipient
set; change this so any unresolved DID causes the blob to be skipped: instead of
`filter_map`, resolve all DIDs and detect failures (e.g., collect Options or
compare `keys.len()` to `dids.len()` after mapping via `did_to_key`), and when a
mismatch is found log a warning referencing `oid` (similar to the existing log)
and `continue` so you do not seal to a partial set; ensure you keep the
`VerifyingKey` handling consistent with downstream code that expects `keys`.
| let data = match crate::git::store::read_object(repo_path, oid) { | ||
| Ok(Some((_t, bytes))) => bytes, | ||
| _ => continue, | ||
| }; |
There was a problem hiding this comment.
Add warnings for object-read and IPFS pin failures instead of silent skips.
At Line 77 and Line 88, failures currently continue without logs. This makes production retry/debugging hard and weakens observability during partial outages.
💡 Suggested fix
- let data = match crate::git::store::read_object(repo_path, oid) {
- Ok(Some((_t, bytes))) => bytes,
- _ => continue,
- };
+ let data = match crate::git::store::read_object(repo_path, oid) {
+ Ok(Some((_t, bytes))) => bytes,
+ Ok(None) => {
+ tracing::warn!(oid = %oid, "blob object not found locally; skipping encrypted pin");
+ continue;
+ }
+ Err(e) => {
+ tracing::warn!(oid = %oid, err = %e, "failed reading blob object; skipping encrypted pin");
+ continue;
+ }
+ };
@@
- let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await {
- Ok(c) if !c.is_empty() => c,
- _ => continue,
- };
+ let cid = match crate::ipfs_pin::pin_git_object(ipfs_api, oid, &envelope).await {
+ Ok(c) if !c.is_empty() => c,
+ Ok(_) => {
+ tracing::warn!(oid = %oid, "pin_git_object returned empty CID; skipping record");
+ continue;
+ }
+ Err(e) => {
+ tracing::warn!(oid = %oid, err = %e, "pin_git_object failed; skipping encrypted pin");
+ continue;
+ }
+ };Also applies to: 88-90
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/encrypted_pin.rs` around lines 77 - 80, Replace
silent `continue` branches when `crate::git::store::read_object(repo_path, oid)`
and the IPFS pin operation fail with warning logs that include contextual
details (repo_path, oid, and the error/result) so failures are observable; in
the match arms around `let data = match
crate::git::store::read_object(repo_path, oid) { ... }` and the subsequent IPFS
pin match, call the module’s logger (e.g., process_logger.warn or log::warn)
with a concise message and the error/result, then continue as before to preserve
behavior.
| /// Fetch raw bytes for a CID from the local Kubo node (`/api/v0/cat`). | ||
| pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> { | ||
| if ipfs_api.is_empty() { | ||
| return Err(anyhow::anyhow!("IPFS not configured")); | ||
| } | ||
| let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid); | ||
| let resp = reqwest::Client::new().post(&url).send().await?; | ||
| if !resp.status().is_success() { | ||
| return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status())); | ||
| } | ||
| Ok(resp.bytes().await?.to_vec()) |
There was a problem hiding this comment.
Bound the Kubo /cat request.
This new helper creates a fresh reqwest::Client with no timeout, so a slow or stalled Kubo fetch can hang both get_encrypted_blob and the sync worker indefinitely. Please reuse a client with a bounded timeout, or set a per-request timeout here.
⏱️ Suggested fix
pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
if ipfs_api.is_empty() {
return Err(anyhow::anyhow!("IPFS not configured"));
}
let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid);
- let resp = reqwest::Client::new().post(&url).send().await?;
+ let resp = reqwest::Client::new()
+ .post(&url)
+ .timeout(std::time::Duration::from_secs(30))
+ .send()
+ .await?;
if !resp.status().is_success() {
return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status()));
}
Ok(resp.bytes().await?.to_vec())
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 75 - 85, The cat function
creates a reqwest::Client with no timeout (reqwest::Client::new()) which can
hang get_encrypted_blob and the sync worker; modify cat (or callers) to use a
client with a bounded timeout (e.g., build a Client via
reqwest::Client::builder().timeout(Duration::from_secs(...)) and reuse it) or
apply a per-request timeout (e.g., wrap the send/bytes await with
tokio::time::timeout) so the POST to {}/api/v0/cat?arg={} fails fast on
slow/stalled Kubo; update the cat signature or module to accept/reuse a Client
or enforce a timeout around resp = client.post(&url).send().await and
resp.bytes().await to avoid indefinite hangs.
| let withheld = fetch_withheld(client, &origin_url, owner_short, repo_name).await; | ||
| let mode = classify_mirror(withheld); | ||
|
|
||
| let result = if local_path.exists() { | ||
| fetch_repo(&local_path, &remote_url).await | ||
| fetch_repo(&local_path, &remote_url, mode).await | ||
| } else { | ||
| clone_repo(&remote_url, &local_path).await | ||
| clone_repo(&remote_url, &local_path, mode).await | ||
| }; |
There was a problem hiding this comment.
Don’t downgrade an unknown mirror state to Plain.
fetch_withheld collapses 404, 5xx, network, and parse failures into None, and this branch always turns that into MirrorMode::Plain. For an existing promisor mirror, that flows into fetch_repo(..., MirrorMode::Plain), which clears remote.origin.promisor and remote.origin.partialclonefilter before refetching. A transient withheld-paths outage on a still-withheld repo will therefore start failing syncs until the endpoint recovers. Only downgrade on Some(empty), and preserve the current promisor mode when the lookup is unknown.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/sync.rs` around lines 181 - 188, The code currently
treats a None result from fetch_withheld as MirrorMode::Plain which incorrectly
downgrades unknown/error states; instead, check the Option returned by
fetch_withheld and only set MirrorMode::Plain when you get Some(empty); if
fetch_withheld returns None preserve the repository's existing
promisor/partialclone state (do not call classify_mirror on None). Concretely,
change the branch around fetch_withheld/classify_mirror to: if
withheld.is_some() { let mode = classify_mirror(withheld); } else { read the
current repo promisor/partialclone setting from the local repo at local_path (or
otherwise keep the current MirrorMode) and use that mode for
fetch_repo/clone_repo }, so fetch_repo/clone_repo are invoked with the preserved
mode rather than unconditionally using MirrorMode::Plain.
| let owner_short = owner.split(':').next_back().unwrap_or(owner); | ||
| let slug = format!("{owner_short}/{name}"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Client-side Arweave lookup normalization:"
rg -n -C2 'owner_short|slug = format!|graphql' crates/gl/src/clone.rs
echo
echo "Server-side encrypted-manifest tag construction:"
rg -n -C4 'encrypted-manifest|Schema|Repo|App-Name' crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src/api/repos.rsRepository: Gitlawb/node
Length of output: 20700
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Find EncryptedManifest and where `manifest.repo` is set:"
rg -n "struct EncryptedManifest|manifest\.repo|EncryptedManifest<'_>" crates/gitlawb-node/src/arweave.rs
echo
echo "Search for `repo:` field construction / slug derivation in node code:"
rg -n "Repo\s*[:=]|repo\s*:\s*|format!\(.*owner.*name|owner.*split\(':'\)|next_back\(\)" crates/gitlawb-node/src/arweave.rs crates/gitlawb-node/src -SRepository: Gitlawb/node
Length of output: 10904
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== crates/gitlawb-node/src/arweave.rs (top + EncryptedManifest + tag header) ==="
nl -ba crates/gitlawb-node/src/arweave.rs | sed -n '1,260p'
echo
echo "=== crates/gl/src/clone.rs (Arweave fallback slug + GraphQL variables) ==="
nl -ba crates/gl/src/clone.rs | sed -n '360,520p'Repository: Gitlawb/node
Length of output: 199
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "arweave.rs lines:"
wc -l crates/gitlawb-node/src/arweave.rs
echo
echo "crates/gitlawb-node/src/arweave.rs: EncryptedManifest + manifest_json payload + build_manifest_tags_header + any construction sites:"
cat -n crates/gitlawb-node/src/arweave.rs | sed -n '1,260p'
echo
echo "clone.rs: Arweave fallback slug + GraphQL body:"
wc -l crates/gl/src/clone.rs
echo
cat -n crates/gl/src/clone.rs | sed -n '380,480p'Repository: Gitlawb/node
Length of output: 15653
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Call sites for anchor_encrypted_manifest / EncryptedManifest construction ==="
rg -n "anchor_encrypted_manifest|EncryptedManifest\s*\{" crates/gitlawb-node/src
echo
echo "=== Inspect around likely repo_slug construction (owner_short + repo_name) in api/repos.rs ==="
rg -n "owner_short\s*=.*split\(':'\)\.next_back\(\)\.unwrap_or|repo_slug\s*=\s*format!" crates/gitlawb-node/src/api/repos.rs -n
sed -n '650,820p' crates/gitlawb-node/src/api/repos.rs | cat -nRepository: Gitlawb/node
Length of output: 9929
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== clone.rs: encrypted-blobs parsing ==="
rg -n "/encrypted-blobs|encrypted-blobs|recover_encrypted_blobs|serde_json::Value" crates/gl/src/clone.rs
echo
echo "=== clone.rs: recover_encrypted_blobs implementation ==="
# Print the function body area (best-effort) around its definition
rg -n "async fn recover_encrypted_blobs" -n crates/gl/src/clone.rs | head -n 5
# Use line ranges around likely area
sed -n '1,220p' crates/gl/src/clone.rs | head -n 220
echo
echo "=== clone.rs: fetch_withheld + schema strictness for withheld-paths ==="
rg -n "withheld-paths|fetch_withheld|recover_from_arweave" crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 9912
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# encrypted-blobs + parsing behavior
echo "=== clone.rs: recover_encrypted_blobs + parse_tx_ids ==="
sed -n '220,420p' crates/gl/src/clone.rs
echo
echo "=== clone.rs: WithheldPathsResponse parsing (schema strictness / fail-open) ==="
sed -n '180,260p' crates/gl/src/clone.rs
echo
echo "=== clone.rs: tests relevant to strictness / arweave recovery ==="
rg -n "withheld-paths|encrypted-blobs|recover_from_arweave|unauthorized|happy|schema" crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 11990
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '740,820p' crates/gl/src/clone.rsRepository: Gitlawb/node
Length of output: 3031
Fix Arweave manifest discovery pagination; encrypted-blobs parsing should be schema-strict
- Arweave
Reposlug normalization matches: the client uses{last-':' owner segment}/{name}, and the node anchorsEncryptedManifest.repowith the sameowner_did.split(':').next_back()short-form, taggingRepo:{manifest.repo}. - Arweave GraphQL discovery hard-codes
first:100; if a repo has >100 encrypted-manifest anchors under the sameRepotag, recovery can miss the newest per-blob manifests and leave still-live blobs unrecovered—needs pagination/iteration until coverage is complete. /encrypted-blobsparsing is fail-open on schema drift:recover_encrypted_blobsusesValueand defaultsblobsto[]viaunwrap_or_default(), unlike/withheld-pathswhich deserializes a typed struct and errors on missing/wrong fields; switch/encrypted-blobsto a strict typed response and fail on missing/wrong types.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gl/src/clone.rs` around lines 415 - 416, The slug normalization and
Arweave discovery/parsing need tightening: ensure owner short-form uses the same
split logic as the node by keeping owner_short =
owner.split(':').next_back().unwrap_or(owner) and forming slug =
format!("{owner_short}/{name}"); update the GraphQL discovery for
encrypted-manifest anchors in discover/recovery code to paginate instead of
hard-coding first:100 (iterate queries until no more edges or timestamps
covered) so recover_encrypted_blobs sees all anchors; change
recover_encrypted_blobs to deserialize the /encrypted-blobs response into a
strict typed struct (like the existing withheld-paths type) and fail on
missing/wrong fields instead of using Value with unwrap_or_default(), ensuring
schema-strict parsing and surfaced errors.
| let query = r#"query($repo:String!){transactions(tags:[{name:"App-Name",values:["gitlawb"]},{name:"Schema",values:["gitlawb/encrypted-manifest/v1"]},{name:"Repo",values:[$repo]}],first:100){edges{node{id}}}}"#; | ||
| let gql_body = serde_json::json!({ "query": query, "variables": { "repo": slug } }); | ||
| let resp = match client | ||
| .post(format!("{ag}/graphql")) | ||
| .json(&gql_body) | ||
| .send() | ||
| .await | ||
| { | ||
| Ok(r) if r.status().is_success() => r, | ||
| _ => return Ok(vec![]), | ||
| }; | ||
| let gql: serde_json::Value = match resp.json().await { | ||
| Ok(v) => v, | ||
| Err(_) => return Ok(vec![]), | ||
| }; | ||
| let tx_ids = parse_tx_ids(&gql); | ||
| if tx_ids.is_empty() { | ||
| return Ok(vec![]); | ||
| } | ||
|
|
||
| // 2. Fetch and parse each manifest body, then merge latest-wins per oid. | ||
| let mut manifests = Vec::new(); | ||
| for tx in tx_ids { | ||
| let m = match client.get(format!("{ag}/{tx}")).send().await { | ||
| Ok(r) if r.status().is_success() => r, | ||
| _ => continue, | ||
| }; | ||
| if let Ok(parsed) = m.json::<Manifest>().await { | ||
| manifests.push(parsed); | ||
| } | ||
| } | ||
| let oid_cid = merge_manifests(manifests); |
There was a problem hiding this comment.
Paginate manifest discovery instead of truncating at 100 transactions.
This fallback only inspects the first 100 manifest txs, then merges per-push manifests to find the latest CID per OID. Once a repo has more than 100 pushes, any private blob whose last seal is older than that window disappears from recovery even if it is still live at HEAD.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gl/src/clone.rs` around lines 428 - 459, The current GraphQL fetch
(the local `query` + `gql_body`, use of
`client.post(...).json(&gql_body).send().await`, and subsequent
`parse_tx_ids(&gql)`) only requests first:100 and stops, causing missing
manifests; change the logic to paginate: rewrite the GraphQL query to take
variables for "first" and "after" (cursor), then loop POSTing the query
(building `gql_body` with `{ "repo": slug, "first": batch_size, "after": cursor
}`), call `parse_tx_ids` (or adjust it to return edges + pageInfo) to collect tx
ids across pages, update `cursor` from pageInfo.nextCursor/hasNextPage until no
more pages, then proceed to fetch manifests and call
`merge_manifests(manifests)` as before; ensure the loop uses the same
`client.post(...).json(&gql_body).send().await` error handling as existing code
and retains the existing behavior when responses fail.
|
hello @beardthelion do you want to be a maintainer of this node repository? I would like to add you as one. let me know if you are interested. |
|
Thanks @kevincodex1, I'd be glad to. I've enjoyed working on this and I'm happy to take on more responsibility for it. Couple of things so I know what I'm signing up for: what's the expectation for the role, just merge access, or also triage/releases and general upkeep? And who else maintains it currently? Either way, count me in. |
Closes #42.
blob_pathsinvisibility_pack.rsbuilds the blob→path map that every withholding decision rests on. Two flaws let it produce an incomplete withheld set, which downstream code then treats as public, leaking withheld content in cleartext:Ref-namespace gap. It walked only
refs/heads/*andrefs/tags/*, but the serve path enumeratesgit rev-list --objects --all(every object reachable from any ref). A blob reachable only through another namespace (e.g.refs/notes/*) never entered the withheld set and shipped in cleartext. Fix: drop the heads/tags filter and walk every ref, still per-ref viagit ls-tree -rso all paths a blob lives at are seen (a blob is withheld only if denied at every path; if it is public at some allowed path it must still be sent). Note this is deliberately notrev-list --objects, which reports one representative path per object and would regress that correctness.Fail-open ref walk. A non-zero
git ls-treeexit was silently skipped, yielding a partial set. Fix: fail closed (bail!). All three callers already abort on error (serve propagates, push/encrypt-pin skip the block), so no path serves unfiltered.It also narrows the two pin enumerators (
ipfs_pin,pinata) fromgit cat-file --batch-all-objectstogit rev-list --objects --all. The old call included unreachable/dangling objects, which have no ref and no tree path, so visibility rules cannot classify them; a dangling copy of a secret blob would bypass withholding straight into a permanent IPFS pin. Reachable-only keeps the pin set aligned with whatblob_pathscan evaluate.Four tests added: a blob reachable only via a non-head/tag ref is still withheld; an untraversable ref makes the withheld computation return
Err; and a dangling blob is absent from each pin enumerator while a committed blob is present.Stacks on #40 (base=main), carrying the cumulative Phase-2/3 prereq diff since these files exist only in the unmerged stack.
Summary by CodeRabbit
gl clonecommand supporting partial clones with content filtering and automatic blob recovery