Skip to content

feat(node): blind recipient identities at rest and gate B1 by repo readability#40

Open
beardthelion wants to merge 64 commits into
Gitlawb:mainfrom
beardthelion:feat/recipient-blinding-at-rest
Open

feat(node): blind recipient identities at rest and gate B1 by repo readability#40
beardthelion wants to merge 64 commits into
Gitlawb:mainfrom
beardthelion:feat/recipient-blinding-at-rest

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Completes recipient-set blinding for encrypted replication: the reader set of a withheld blob is no longer stored or published anywhere in cleartext, only in the encrypted envelope key-wraps.

This stacks on #36 (encrypted replication B1/B2/B3, which already blinds the envelope, the peer-replication wire, and the Arweave manifest). Since the upstream repo only has main, the diff here carries that stack as prerequisites; the new work in this PR is the at-rest change.

What changed

  • The encrypted_blobs.recipients column (a cleartext DID list) is replaced (migration v5) with an opaque, node-keyed recipients_tag (HMAC-SHA256 over the sorted DID set). The tag is used only to detect a recipient-set change so an unchanged blob is not re-sealed; it is never the DID list.
  • B1 discovery (/encrypted-blobs) and fetch (/encrypted-blob/{oid}) are now authorized by repo readability (the same visibility_check the git read path uses) rather than per-recipient matching. Decryption is gated by the envelope crypto: only a real recipient can open an envelope.
  • encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the Arweave manifest tuple drops the now-unused recipient vec; the mirror path stores no recipient data.

Why

An origin that enforces per-recipient authorization must be able to test whether a caller is a recipient, which means it can always recover the reader set. Moving authorization onto repo readability plus envelope crypto takes the origin out of that loop, so a DB compromise no longer reveals who can read a blob.

Accepted trade-offs

  • A repo-reader who is not a recipient of a given blob can see that the blob exists ({oid, cid}). The OID is already in the tree they clone and the envelope bytes are already public on IPFS, so no new secret is exposed.
  • A full origin compromise (DB plus the node key) could still brute-force the reader set from the keyed tag. The tag is only a re-seal change-detector; authorization no longer depends on it.

Tests

recipients_tag is covered by unit tests (order-insensitive, set-sensitive, node-keyed). The handler authorization change is wiring around the existing, tested visibility_check.

Summary by CodeRabbit

  • New Features

    • End-to-end encryption for private blobs with per-recipient decryption keys
    • Visibility-aware partial clone excluding withheld content and recovering encrypted blobs
    • Encrypted blob replication and distributed persistence across nodes
    • Encrypted manifest anchoring to Arweave for recovery
    • Withheld paths visibility query endpoint
    • Smart-HTTP protocol optimization for filtered git operations
  • Chores

    • Database schema and migrations for encrypted blob storage and tracking

…al clone

upload_pack_excluding emitted a v2 packfile section, but info_refs
advertises v0, so real clients negotiated v0 and rejected the response
with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape
instead (NAK, then the pack via side-band-64k when offered).

Add an end-to-end test that stands up info_refs + upload_pack_excluding
and runs a real git partial clone, asserting the withheld blob's bytes
never reach the client while its tree entry and SHA stay visible. A stock
full clone cannot consume the pack (it is not closed under reachability,
so fetch fails the connectivity check); a partial clone is required.
…tion choice

Add a real-git test that partial-clones, pushes a new commit server-side,
then fetches: the new object arrives and the withheld blob stays absent.
This pins down that ignoring have/want negotiation (always sending a
self-contained pack of all refs minus withheld, with NAK) is correct for
both clone and fetch; the only cost is a fetch re-sends the full object
set. Refactor the real-git tests onto a shared server harness and document
the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off
the async worker thread, matching the tokio::process / spawn_blocking usage
already in this file: build_filtered_pack (rev-list + pack-objects) and
withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large
repo cannot stall the tokio runtime. Behavior is unchanged.

Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2
packfile framing, which is the exact path that failed against a real client
and was corrected to v0. The block now documents the shipped v0 contract.
Drop a stray trailing code fence flagged by markdownlint (MD040).

The speculative ls-tree timeout and the public/no-rules fast-path from the
review are intentionally left out: the timeout guards against adversarial
repos we do not yet host, and the fast-path is a micro-optimization not worth
the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The
Phase 3 plan was scaffolding for this change, not something the project needs
to carry. Removing it leaves only the code and tests in the PR.
Three fixes from the PR Gitlawb#33 review:

- withheld_paths now applies the whole-repo "/" read gate (returns
  repo-not-found when the caller cannot read the root), matching the git read
  endpoints. Without it the endpoint disclosed a private repo's existence and
  path layout to unauthorized callers. The withheld_globs doc already assumed
  this gate existed; now it does.

- A nested allow under a denied parent (e.g. "/secret/public/**" allowed,
  "/secret/**" denied) was over-withheld: the client sparse-excluded the whole
  parent and hid paths the caller may read. The endpoint now also returns a
  "reinclude" list (allowed globs strictly under a denied one) and gl clone
  re-includes them in the sparse spec after the excludes.

- Wildcard-free globs like "/docs/private" match both the exact path and a
  subtree (per glob_matches), but the client only emitted the subtree exclude.
  sparse_patterns now emits both "/docs/private" and "/docs/private/".

Verified the exclude-then-reinclude sparse ordering checks out cleanly with
real git, plus unit tests for reincluded_globs, the nested re-include, the
exact-path exclude, and sparse_patterns.
…eld-path errors

split_once('/') accepted owner/name/extra, smuggling a path segment into
the repo name that then flowed into the API path and clone URL; reject it.

fetch_withheld swallowed every network/auth/5xx/JSON error into an empty
result, dropping to a stock clone that the node refuses once blobs are
withheld. Now only 404/501 (endpoint unsupported) fall back to empty; the
rest propagate so the real cause surfaces.
The receive-pack replication chokepoint called withheld_blob_oids
directly on the tokio worker, where its blocking git ls-tree walk can
stall the runtime for repos with many refs. Wrap it in spawn_blocking
to match the upload-pack serve path.
Add a regression test for deny /secret, allow /secret/public, deny
/secret/public/admin and clarify the sparse-checkout comment. git does
not re-traverse an explicitly excluded directory, so emitting all
excludes before re-includes keeps the deepest deny in force; the broader
parent re-include does not resurrect it.
…hema

Read the default branch from the local origin/HEAD symref clone already
set, instead of parsing the localized, network-dependent output of
git remote show origin. Deserialize the withheld-paths body into a typed
struct so a missing or mistyped withheld/reinclude field is a hard error
rather than silently becoming an empty list, which would mask a server
regression behind a later clone failure.
beardthelion and others added 20 commits June 10, 2026 18:52
…or present-check

Add two hermetic integration tests for recover_from_arweave that drive the
full read path over mocked Arweave GraphQL + IPFS gateways: discover the
manifest, fetch it, fetch the envelope, decrypt, and install the withheld
blob. One covers an authorized recipient (blob installed), the other a
non-recipient (nothing recovered). Both simulate origin death by removing
the promisor remote and enable uploadpack.allowFilter so the blob is truly
withheld over file://.

Also harden the local presence check in recover_from_arweave with
GIT_NO_LAZY_FETCH=1 and .output() so the expected 'missing object' case does
not trigger a wasted promisor fetch or leak git stderr to the user.
db: parse_recipients now propagates a descriptive error instead of
defaulting corrupt recipients JSON to an empty list, which would have
denied authorized readers and handed peers incomplete metadata.

gl: clone recovery now warns when the sparse-checkout file cannot be read
or written, or when the post-recovery checkout fails, instead of silently
discarding those errors and claiming files were recovered.
Envelope v2 drops the cleartext recipient public key (kid) from each
wrapped-key header entry, so a party holding the public IPFS/Arweave copy
can no longer enumerate who is authorized to decrypt a withheld blob. The
version is bumped to 2 and v1 envelopes are rejected. open_blob now selects
the reader's entry by trial decryption (the AEAD tag authenticates exactly
one entry) instead of matching on the public key.

Recipient count is still visible; identity is not. Scope is envelope-only,
the node DB recipients column and peer replication metadata are unchanged.

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
open_blob fed attacker-controlled envelopes to GenericArray::from_slice and
XNonce::from_slice, which panic on a wrong-length input. Validate both the
per-recipient box nonce and the body nonce to 24 bytes before use: skip a
recipient entry whose nonce is malformed, and return an error for a malformed
body nonce, so the public recovery path surfaces an error rather than panicking.
The encrypted-blobs/replicate endpoint shipped every blob's full recipient
DID list to mirrors, which persisted it, so any mirroring peer learned the
reader set. The v2 envelope blinding already removed recipient public keys
from the pinned envelopes, so the comment justifying this (DIDs are already
public) was no longer true.

/replicate now returns {oid, cid} only. Mirrors detect a re-seal by the CID
changing (the OID is stable across re-seals) instead of comparing recipient
sets, and store no recipient identities. Origin-side authz and the at-rest
recipients column are unchanged; this blinds only the peer-facing surface.
The B3 encrypted-blob manifest anchored {oid, cid, recipients} to Arweave, a
permanent public record of every blob's reader set. The v2 envelope blinding
already removed recipient keys from the pinned envelopes, so the comment
justifying this (recipient DIDs are already public) was no longer true. The
manifest now anchors {oid, cid} only. The gl reader already ignores the
recipients field (it recovers by trial decryption), so no reader changes.
…adability

The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.

A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9b15af22-5014-40e8-b81f-14db0fd95198

📥 Commits

Reviewing files that changed from the base of the PR and between e715c57 and f5864fc.

📒 Files selected for processing (2)
  • crates/gitlawb-node/src/api/encrypted.rs
  • crates/gitlawb-node/src/encrypted_pin.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/gitlawb-node/src/encrypted_pin.rs
  • crates/gitlawb-node/src/api/encrypted.rs

📝 Walkthrough

Walkthrough

This PR implements Option B encrypted withheld blob handling, allowing owners to withhold sensitive blobs while enabling authorized recipients to decrypt and recover them. It spans core cryptographic envelope sealing/opening, visibility-driven blob filtering, filtered Git smart-HTTP responses, encrypted blob persistence with recipients tagging, peer replication with IPFS/Pinata, and client-side sparse clone with multi-stage blob recovery from node APIs and Arweave.

Changes

Encrypted withheld blob support

Layer / File(s) Summary
Envelope crypto and key derivation
crates/gitlawb-core/Cargo.toml, crates/gitlawb-core/src/encrypt.rs, crates/gitlawb-core/src/identity.rs, crates/gitlawb-core/src/lib.rs, crates/gitlawb-node/Cargo.toml
seal_blob and open_blob implement XChaCha20-Poly1305 envelope encryption with per-recipient ChaChaBox key wrapping, blinded headers omitting recipient public keys, and X25519 derivation from Ed25519 DIDs. Keypair exposes seed_bytes() for decryption. Crypto dependencies (curve25519-dalek, crypto_box, chacha20poly1305, ed25519-dalek) are added.
Visibility-derived withheld blob filtering
crates/gitlawb-node/src/git/visibility_pack.rs, crates/gitlawb-node/src/visibility.rs, crates/gitlawb-node/src/git/mod.rs
withheld_blob_oids determines per-blob denial by aggregating visibility checks across all paths a blob inhabits, withheld_blob_recipients derives allowed reader DIDs for each withheld blob, and replicable_objects filters objects by withheld set. withheld_globs and reincluded_globs compute caller-visible exclude/include patterns for sparse checkout.
Filtered upload-pack responses
crates/gitlawb-node/src/git/smart_http.rs, crates/gitlawb-node/src/api/repos.rs
build_filtered_pack and upload_pack_excluding serve Git packs excluding withheld blobs with side-band-64k v0 framing support. git_upload_pack computes withheld set and conditionally serves filtered pack. End-to-end tests confirm partial clones omit private blobs while preserving tree visibility.
Encrypted blob persistence and sealing
crates/gitlawb-node/src/db/mod.rs, crates/gitlawb-node/src/encrypted_pin.rs, crates/gitlawb-node/src/main.rs
Schema migrations v4/v5 add encrypted_blobs table with (repo_id, oid) primary key, cid, and recipients_tag for tracking envelope receipts. recipients_tag computes an order-insensitive node-seed-keyed HMAC over recipient DIDs, and encrypt_and_pin orchestrates sealing, IPFS pinning, and DB recording with per-blob best-effort idempotence.
Replicable object pinning
crates/gitlawb-node/src/ipfs_pin.rs, crates/gitlawb-node/src/pinata.rs
pin_new_objects in both IPFS and Pinata modules now filters enumerated objects through replicable_objects(withheld) before pinning. IPFS module adds cat function to retrieve raw envelope bytes from Kubo by CID.
Encrypted manifest anchoring
crates/gitlawb-node/src/arweave.rs
EncryptedManifest and anchor_encrypted_manifest upload per-push OID/CID manifest JSONs to Irys with manifest-specific tags, conditionally skipping when no blobs or empty URL. Tests validate no-ops and successful anchoring paths.
Read APIs and HTTP routes
crates/gitlawb-node/src/api/visibility.rs, crates/gitlawb-node/src/api/encrypted.rs, crates/gitlawb-node/src/api/mod.rs, crates/gitlawb-node/src/server.rs
withheld_paths endpoint returns caller-scoped withheld and reinclude globs; list_encrypted_blobs, get_encrypted_blob, and replicate_encrypted_blobs expose encrypted envelopes via REST, omitting recipient data from replication payloads. Routes registered on git_read_routes with optional authentication.
Push-time replication enforcement
crates/gitlawb-node/src/api/repos.rs
git_receive_pack now computes announce (public readability) and conditional withheld set, gating IPFS/encrypt-then-pin to when withheld is Some, and gating peer notifications and Arweave anchoring to when announce is true. Pinata pinning skips entirely when withheld is None.
Mirror sync with encrypted replication
crates/gitlawb-node/src/sync.rs
Sync worker fetches origin withheld-paths and classifies mirror as Promisor (with blob filter) or Plain mode. After mirroring, calls origin encrypted-blobs/replicate, determines missing blobs by OID/CID, fetches and pins envelopes, and records encrypted_blobs rows. Git helper refactoring includes error stderr capture and promisor/partial-clone config transitions.
CLI clone with sparse and recovery
crates/gl/src/clone.rs, crates/gl/src/main.rs
gl clone parses repo, fetches withheld patterns, performs sparse partial clone excluding withheld globs, then performs node-based encrypted blob recovery (fetch list, download envelopes, decrypt via open_blob, install via git hash-object), with fallback to Arweave/IPFS manifest-based recovery using latest-wins merge and per-blob OID validation. Comprehensive tests cover sparse patterns, manifest parsing, authorized/unauthorized recovery cases.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

This PR introduces a substantial, multi-layered encrypted blob handling system spanning cryptographic primitives, visibility-driven filtering, Git protocol modifications, database schema, replication orchestration, and client recovery logic. The changes are highly interconnected—each layer depends on earlier ones—with dense logic in pack filtering, encryption/pinning orchestration, mirror replication decisions, and manifest-based blob recovery. The breadth (15+ files), heterogeneity (crypto, DB, HTTP APIs, CLI, Git internals), and critical nature (security-sensitive encryption/decryption paths) demand careful review of correctness, error handling, and integration across subsystems.

Possibly related PRs

  • Gitlawb/node#25: Extends the existing path-scoped visibility subsystem by adding withheld-paths API and glob helpers that build on visibility rules used in this PR.

Suggested reviewers

  • kevincodex1

Poem

🐰 Whiskers twitching with delight,
Blobs now whisper, hidden tight!
Crypto seals what owners choose,
Sparse clones show what readers use. 🔐✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: implementing blind recipient identities at rest (via recipients_tag) and gating B1 encrypted blob discovery/fetch by repo readability instead of per-recipient matching.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/repos.rs (1)

327-340: ⚠️ Potential issue | 🔴 Critical

Gate service=git-receive-pack info/refs with the same visibility/RepoNotFound behavior as git-upload-pack

  • crates/gitlawb-node/src/server.rs mounts GET /{owner}/{repo}/info/refs with auth::optional_signature, so unauthenticated callers can hit this handler.
  • In crates/gitlawb-node/src/api/repos.rs (git_info_refs), the visibility_check(...)->Decision::Deny path that maps to AppError::RepoNotFound(...) runs only for service == "git-upload-pack".
  • For service == "git-receive-pack", the handler skips visibility_check, still acquires the repo (acquire_fresh) and returns smart_http::info_refs(...), which can leak private-repo existence and advertised refs.

Add the same read-visibility gate for service == "git-receive-pack" and keep mapping deny/unauthorized to RepoNotFound to avoid an existence oracle and ref disclosure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 327 - 340, The handler
currently only enforces read visibility for service == "git-upload-pack"; update
the branch handling of service == "git-receive-pack" in git_info_refs (where
service, record, auth, and name are in scope) to call
state.db.list_visibility_rules(&record.id).await?, compute caller the same way
(auth.as_ref().map(|e| e.0 .0.as_str())), and run visibility_check(&rules,
record.is_public, &record.owner_did, caller, "/") and if it returns
Decision::Deny return Err(AppError::RepoNotFound(format!("{owner}/{name}"))),
mirroring the existing git-upload-pack logic so unauthenticated/unauthorized
callers do not observe private repo existence or refs.

Source: Learnings

🧹 Nitpick comments (1)
crates/gitlawb-node/src/sync.rs (1)

310-324: 💤 Low value

Pinned envelope with mismatched CID is not unpinned.

When pin_git_object returns a CID that differs from the expected one, the code logs a warning and skips recording, but the incorrectly-pinned content remains in IPFS storage. This is a minor storage leak rather than a correctness issue (the DB row is not created, so the next sync will re-attempt), but over many syncs with a misbehaving IPFS node it could accumulate garbage.

Consider unpinning the mismatched CID before continuing, or document this as an accepted trade-off.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/sync.rs` around lines 310 - 324, The mismatched CID
path currently logs and continues but leaves the wrong content pinned in IPFS;
update the match arm handling Ok(cid) if cid != blob.cid to unpin the unexpected
CID before continuing (call the appropriate unpin helper in crate::ipfs_pin,
e.g., crate::ipfs_pin::unpin_git_object(ipfs_api, &cid) or use ipfs_api
directly), then log any unpin error and continue; keep the existing call to
db.record_encrypted_blob only in the successful CID-equals branch and ensure
references to pin_git_object, blob.oid, blob.cid, db.record_encrypted_blob,
ipfs_api, and envelope are used to locate where to add the unpin/cleanup call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The method seed_bytes() exposes raw 32-byte private seed as
[u8; 32]; change it to follow the existing to_seed() safety pattern by returning
a zeroizing/secret wrapper instead of a plain array (e.g., use the same type
returned by to_seed() or Zeroizing<[u8;32]>), and update call sites that use
seed_bytes() (e.g., where x25519_secret_from_seed and XSecret::from are invoked)
to accept the secret wrapper or to extract the inner bytes temporarily in a
zeroized scope; keep signing_key and to_seed() as the canonical source of seed
material and remove plain [u8;32] returns to ensure automatic zeroization.

In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 40-42: Update the endpoint doc comment for GET
/api/v1/repos/{owner}/{repo}/encrypted-blob/{oid} (function get_encrypted_blob)
to reflect current authorization: state that the handler returns raw envelope
bytes when the repository is readable by the caller (repo read permission),
rather than claiming recipient-based access; keep any note about additional
envelope-level checks if still applicable but remove the outdated "recipient"
wording so clients and future maintainers see the correct contract.
- Around line 75-90: The replicate_encrypted_blobs handler currently calls
state.db.list_all_encrypted_blobs(&record.id) without verifying read access;
before listing blobs, perform the repo-readability gate using the app's
authorization helper (e.g. call a method like
state.ensure_repo_readable(&record) or state.auth.ensure_repo_readable(&owner,
&repo, &record.id)), return an authorization error if the caller cannot read the
repo, and only then call state.db.list_all_encrypted_blobs(&record.id); ensure
the check happens in replicate_encrypted_blobs immediately after get_repo(...)
and before constructing blobs so private repo metadata is never exposed to
unauthorized callers.

In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 71-74: The loop silently skips when
crate::git::store::read_object(repo_path, oid) returns Err or Ok(None); change
those silent continues to log the failure before continuing: capture the Err
detail and log it (including repo_path and oid) and also log a warning when
Ok(None) (missing blob) so missing encrypted blobs are diagnosable; apply the
same change to the similar branch later (the second read_object check at lines
~82-85) so both read_object failure paths emit contextual logs rather than
silently continuing.
- Around line 61-65: The current code treats errors from
db.encrypted_blob_recipients_tag(repo_id, oid).await as cache-misses and
proceeds to reseal/pin, causing external writes during DB failures; change the
logic to match on the Result explicitly: handle Ok(Some(stored_tag)) -> compare
to tag and continue if equal, Ok(None) -> proceed with reseal, and Err(e) ->
propagate or return the error (do not continue to reseal/pin). Use
encrypted_blob_recipients_tag, repo_id, oid and tag in the match and propagate
the DB error via the function's error return (or use the ? operator) instead of
swallowing it.

In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 21-24: The current loop in visibility_pack.rs (the refname check
inside the for (refname, _oid) in refs) only keeps refs under refs/heads/* and
refs/tags/*, which mismatches the broader object enumeration used elsewhere
(smart_http.rs uses rev-list --objects --all and ipfs_pin.rs filters all repo
objects), so update visibility_pack.rs to use the same ref/object enumerator as
the pack/pin paths: remove the restrictive
starts_with("refs/heads/")/starts_with("refs/tags/") checks and instead iterate
the full ref namespace (e.g., all refs under refs/* or use the same
rev-list-based enumeration method), so the withheld set includes any object
reachable from all refs just like blob_paths, smart_http.rs, and ipfs_pin.rs do.
Ensure you reference the same mechanism (function or iterator used for rev-list
traversal) so both packing/pinning and withholding share identical ref/object
enumeration.
- Around line 30-31: The code currently continues on a non-zero `listing.status`
which silently skips a failing `git ls-tree` and can leak data; in the function
in visibility_pack.rs where `listing` is handled (the block checking `if
!listing.status.success()`), replace the `continue` with an error return that
fails closed (e.g., return an Err with context indicating the ref and the
`listing.status` and `listing.stderr`/output). Ensure you propagate a failure up
so callers abort fetch/replication instead of treating missing blobs as public.

In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-86: The cat function currently issues
reqwest::Client::new().post(...).send().await? with no timeout; update cat to
use a bounded request timeout (e.g., via
reqwest::Client::builder().timeout(Duration::from_secs(<N>)).build() and then
.post(...).send().await?) or wrap the send/bytes await in
tokio::time::timeout(Duration::from_secs(<N>), ...).await and convert timeout
errors to anyhow errors; reference the cat function and ipfs_api/cid parameters
when making the change and pick a configurable constant or env-driven value for
the timeout so production worst-case envelope/read latency can be tuned.

---

Outside diff comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 327-340: The handler currently only enforces read visibility for
service == "git-upload-pack"; update the branch handling of service ==
"git-receive-pack" in git_info_refs (where service, record, auth, and name are
in scope) to call state.db.list_visibility_rules(&record.id).await?, compute
caller the same way (auth.as_ref().map(|e| e.0 .0.as_str())), and run
visibility_check(&rules, record.is_public, &record.owner_did, caller, "/") and
if it returns Decision::Deny return
Err(AppError::RepoNotFound(format!("{owner}/{name}"))), mirroring the existing
git-upload-pack logic so unauthenticated/unauthorized callers do not observe
private repo existence or refs.

---

Nitpick comments:
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 310-324: The mismatched CID path currently logs and continues but
leaves the wrong content pinned in IPFS; update the match arm handling Ok(cid)
if cid != blob.cid to unpin the unexpected CID before continuing (call the
appropriate unpin helper in crate::ipfs_pin, e.g.,
crate::ipfs_pin::unpin_git_object(ipfs_api, &cid) or use ipfs_api directly),
then log any unpin error and continue; keep the existing call to
db.record_encrypted_blob only in the successful CID-equals branch and ensure
references to pin_git_object, blob.oid, blob.cid, db.record_encrypted_blob,
ipfs_api, and envelope are used to locate where to add the unpin/cleanup call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 774f892c-6ba4-481c-acdb-4008c3174443

📥 Commits

Reviewing files that changed from the base of the PR and between a2217bf and e715c57.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (24)
  • .gitignore
  • crates/gitlawb-core/Cargo.toml
  • crates/gitlawb-core/src/encrypt.rs
  • crates/gitlawb-core/src/identity.rs
  • crates/gitlawb-core/src/lib.rs
  • crates/gitlawb-node/Cargo.toml
  • crates/gitlawb-node/src/api/encrypted.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/api/visibility.rs
  • crates/gitlawb-node/src/arweave.rs
  • crates/gitlawb-node/src/db/mod.rs
  • crates/gitlawb-node/src/encrypted_pin.rs
  • crates/gitlawb-node/src/git/mod.rs
  • crates/gitlawb-node/src/git/smart_http.rs
  • crates/gitlawb-node/src/git/visibility_pack.rs
  • crates/gitlawb-node/src/ipfs_pin.rs
  • crates/gitlawb-node/src/main.rs
  • crates/gitlawb-node/src/pinata.rs
  • crates/gitlawb-node/src/server.rs
  • crates/gitlawb-node/src/sync.rs
  • crates/gitlawb-node/src/visibility.rs
  • crates/gl/src/clone.rs
  • crates/gl/src/main.rs

Comment on lines +55 to +59
/// The raw 32-byte Ed25519 seed. Used to derive the X25519 secret for
/// envelope decryption (see `crate::encrypt`).
pub fn seed_bytes(&self) -> [u8; 32] {
self.signing_key.to_bytes()
}

@coderabbitai coderabbitai Bot Jun 12, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exposing raw seed bytes as a plain return type.

Line 57 returns [u8; 32], which duplicates private key material without zeroization and bypasses the existing to_seed() safety pattern.

Suggested fix
-    pub fn seed_bytes(&self) -> [u8; 32] {
-        self.signing_key.to_bytes()
-    }
+    pub(crate) fn seed_bytes(&self) -> Zeroizing<[u8; 32]> {
+        self.to_seed()
+    }
// crates/gitlawb-core/src/encrypt.rs (call site)
let seed = keypair.seed_bytes();
let my_x = XSecret::from(x25519_secret_from_seed(&seed));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, The method
seed_bytes() exposes raw 32-byte private seed as [u8; 32]; change it to follow
the existing to_seed() safety pattern by returning a zeroizing/secret wrapper
instead of a plain array (e.g., use the same type returned by to_seed() or
Zeroizing<[u8;32]>), and update call sites that use seed_bytes() (e.g., where
x25519_secret_from_seed and XSecret::from are invoked) to accept the secret
wrapper or to extract the inner bytes temporarily in a zeroized scope; keep
signing_key and to_seed() as the canonical source of seed material and remove
plain [u8;32] returns to ensure automatic zeroization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked as #41. This is a pre-existing API (the PR only adds a caller); routing seed access through the existing to_seed() zeroizing wrapper touches other call sites, so handling it separately to keep this PR scoped to recipient-set blinding.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beardthelion — understood, thanks for the context and for opening #41. The scope boundary makes sense; the fix is straightforward once that issue is picked up. I'll leave the concern open here as a reference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #41 to keep this PR scoped to the blinding change. Will route seed access through the zeroizing wrapper there.

Comment thread crates/gitlawb-node/src/api/encrypted.rs
Comment thread crates/gitlawb-node/src/api/encrypted.rs
Comment thread crates/gitlawb-node/src/encrypted_pin.rs Outdated
Comment thread crates/gitlawb-node/src/encrypted_pin.rs
Comment thread crates/gitlawb-node/src/git/visibility_pack.rs
Comment thread crates/gitlawb-node/src/git/visibility_pack.rs
Comment on lines +75 to +86
/// Fetch raw bytes for a CID from the local Kubo node (`/api/v0/cat`).
pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
if ipfs_api.is_empty() {
return Err(anyhow::anyhow!("IPFS not configured"));
}
let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid);
let resp = reqwest::Client::new().post(&url).send().await?;
if !resp.status().is_success() {
return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status()));
}
Ok(resp.bytes().await?.to_vec())
}

@coderabbitai coderabbitai Bot Jun 12, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Locate cat implementation =="
ls -la crates/gitlawb-node/src || true
rg -n "pub async fn cat\\(" crates/gitlawb-node/src/ipfs_pin.rs
sed -n '60,110p' crates/gitlawb-node/src/ipfs_pin.rs

echo
echo "== Check for timeouts in this file =="
rg -n "\\.(timeout|connect_timeout|read_timeout|write_timeout)\\(" crates/gitlawb-node/src/ipfs_pin.rs || true
rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/ipfs_pin.rs || true

echo
echo "== Find call sites of cat() =="
rg -n "ipfs_pin::cat\\(|\\bcat\\(ipfs_api, cid\\)|\\bcat\\([^\\)]*ipfs_api" crates/gitlawb-node/src || true

echo
echo "== Search repo for reqwest client timeout usage =="
rg -n "reqwest::Client::new\\(|Client::builder\\(|\\.timeout\\(Duration::|tokio::time::timeout\\(" crates || true

Repository: Gitlawb/node

Length of output: 6214


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Inspect api/encrypted.rs around cat usage =="
sed -n '1,120p' crates/gitlawb-node/src/api/encrypted.rs
echo
rg -n "ipfs_pin::cat\\(|crate::ipfs_pin::cat\\(|cat\\(" crates/gitlawb-node/src/api/encrypted.rs

echo
echo "== Inspect sync.rs around cat usage =="
sed -n '250,360p' crates/gitlawb-node/src/sync.rs
echo
rg -n "ipfs_pin::cat\\(|crate::ipfs_pin::cat\\(" crates/gitlawb-node/src/sync.rs

echo
echo "== Check for tokio::time::timeout usage near these areas =="
rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/sync.rs

Repository: Gitlawb/node

Length of output: 8969


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '1,120p' crates/gitlawb-node/src/ipfs_pin.rs

Repository: Gitlawb/node

Length of output: 4233


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "reqwest\s*=" -S Cargo.toml crates -g "Cargo.toml"
rg -n "name = \"reqwest\"" Cargo.lock || true
rg -n "reqwest.*version" Cargo.lock || true

Repository: Gitlawb/node

Length of output: 400


🌐 Web query:

reqwest 0.12 RequestBuilder .timeout(Duration) API

💡 Result:

In reqwest 0.12, the RequestBuilder::timeout(Duration) method is used to configure a timeout for an individual request [1][2]. When this method is called, the provided Duration sets a timeout that begins when the request starts the connection process and ends once the response body has been fully received [1][2]. This per-request timeout overrides any timeout that may have been globally configured on the Client via ClientBuilder::timeout [1][2]. The method takes a std::time::Duration as an argument and returns the modified RequestBuilder, allowing for a fluent builder pattern [1]. This API is available in both the asynchronous reqwest::RequestBuilder and the blocking reqwest::blocking::RequestBuilder [1][2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/api/encrypted.rs crates/gitlawb-node/src/sync.rs || true
rg -n "ipfs_pin::cat\\(" crates/gitlawb-node/src/api/encrypted.rs crates/gitlawb-node/src/sync.rs

Repository: Gitlawb/node

Length of output: 282


Add a bounded timeout to IPFS cat requests

crates/gitlawb-node/src/ipfs_pin.rs’s cat does a reqwest::Client::new().post(...).send().await? without any explicit request timeout, and the ipfs_pin::cat(...) call sites aren’t wrapped in a tokio::time::timeout(...), so a stalled Kubo can hang the handler indefinitely.

Suggested patch
+use std::time::Duration;
+
 pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
@@
-    let resp = reqwest::Client::new().post(&url).send().await?;
+    let resp = reqwest::Client::new()
+        .post(&url)
+        .timeout(Duration::from_secs(15))
+        .send()
+        .await?;

Set the timeout value to match the worst-case envelope size / read latency expected in production.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 75 - 86, The cat function
currently issues reqwest::Client::new().post(...).send().await? with no timeout;
update cat to use a bounded request timeout (e.g., via
reqwest::Client::builder().timeout(Duration::from_secs(<N>)).build() and then
.post(...).send().await?) or wrap the send/bytes await in
tokio::time::timeout(Duration::from_secs(<N>), ...).await and convert timeout
errors to anyhow errors; reference the cat function and ipfs_api/cid parameters
when making the change and pick a configurable constant or env-driven value for
the timeout so production worst-case envelope/read latency can be tuned.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing pinning code, not touched by this PR; out of scope for the recipient-set blinding change here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beardthelion, the cat function is actually new code introduced in this PR — it appears in the diff with + lines and does not exist in the base branch. So the missing timeout does fall within the scope of this change.

That said, if you'd prefer to keep this PR focused on the recipient-set blinding and address the timeout in a follow-up, that's a reasonable call. Would you like me to open a GitHub issue to track adding a bounded timeout to ipfs_pin::cat?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was wrong to call this pre-existing: cat() is new in this stack (introduced in the B1 fetch work, 74b7a40), so the missing timeout is in scope for the encrypted-replication change. Valid robustness issue; will address it in that lane rather than here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #45.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #45 (bounded timeout for ipfs_pin::cat). Handling it there rather than in this PR.

…rror

Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
  check, so a non-readable repo's blob index was reachable by an unauthenticated
  caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
  discovery and fetch use. For the intended case (a public repo with withheld
  subtrees) the public root keeps this open to peers; only fully non-readable
  repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
  resealed, causing avoidable IPFS writes during a partial outage; skip and retry
  on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant