feat(node): blind recipient identities at rest and gate B1 by repo readability#40
feat(node): blind recipient identities at rest and gate B1 by repo readability#40beardthelion wants to merge 64 commits into
Conversation
…al clone upload_pack_excluding emitted a v2 packfile section, but info_refs advertises v0, so real clients negotiated v0 and rejected the response with 'expected ACK/NAK, got packfile'. Frame the v0 stateless-rpc shape instead (NAK, then the pack via side-band-64k when offered). Add an end-to-end test that stands up info_refs + upload_pack_excluding and runs a real git partial clone, asserting the withheld blob's bytes never reach the client while its tree entry and SHA stay visible. A stock full clone cannot consume the pack (it is not closed under reachability, so fetch fails the connectivity check); a partial clone is required.
…tion choice Add a real-git test that partial-clones, pushes a new commit server-side, then fetches: the new object arrives and the withheld blob stays absent. This pins down that ignoring have/want negotiation (always sending a self-contained pack of all refs minus withheld, with NAK) is correct for both clone and fetch; the only cost is a fetch re-sends the full object set. Refactor the real-git tests onto a shared server harness and document the negotiation decision in code and in the plan's follow-ups.
Move the two blocking git shell-outs in the filtered upload-pack path off the async worker thread, matching the tokio::process / spawn_blocking usage already in this file: build_filtered_pack (rev-list + pack-objects) and withheld_blob_oids (per-ref ls-tree) now run inside spawn_blocking so a large repo cannot stall the tokio runtime. Behavior is unchanged. Also fix the Task 0 findings block in the Phase 3 plan: it still recorded v2 packfile framing, which is the exact path that failed against a real client and was corrected to v0. The block now documents the shipped v0 contract. Drop a stray trailing code fence flagged by markdownlint (MD040). The speculative ls-tree timeout and the public/no-rules fast-path from the review are intentionally left out: the timeout guards against adversarial repos we do not yet host, and the fast-path is a micro-optimization not worth the extra branch right now.
kevincodex1 asked to keep the superpowers planning docs out of the repo. The Phase 3 plan was scaffolding for this change, not something the project needs to carry. Removing it leaves only the code and tests in the PR.
Three fixes from the PR Gitlawb#33 review: - withheld_paths now applies the whole-repo "/" read gate (returns repo-not-found when the caller cannot read the root), matching the git read endpoints. Without it the endpoint disclosed a private repo's existence and path layout to unauthorized callers. The withheld_globs doc already assumed this gate existed; now it does. - A nested allow under a denied parent (e.g. "/secret/public/**" allowed, "/secret/**" denied) was over-withheld: the client sparse-excluded the whole parent and hid paths the caller may read. The endpoint now also returns a "reinclude" list (allowed globs strictly under a denied one) and gl clone re-includes them in the sparse spec after the excludes. - Wildcard-free globs like "/docs/private" match both the exact path and a subtree (per glob_matches), but the client only emitted the subtree exclude. sparse_patterns now emits both "/docs/private" and "/docs/private/". Verified the exclude-then-reinclude sparse ordering checks out cleanly with real git, plus unit tests for reincluded_globs, the nested re-include, the exact-path exclude, and sparse_patterns.
…eld-path errors
split_once('/') accepted owner/name/extra, smuggling a path segment into
the repo name that then flowed into the API path and clone URL; reject it.
fetch_withheld swallowed every network/auth/5xx/JSON error into an empty
result, dropping to a stock clone that the node refuses once blobs are
withheld. Now only 404/501 (endpoint unsupported) fall back to empty; the
rest propagate so the real cause surfaces.
The receive-pack replication chokepoint called withheld_blob_oids directly on the tokio worker, where its blocking git ls-tree walk can stall the runtime for repos with many refs. Wrap it in spawn_blocking to match the upload-pack serve path.
Add a regression test for deny /secret, allow /secret/public, deny /secret/public/admin and clarify the sparse-checkout comment. git does not re-traverse an explicitly excluded directory, so emitting all excludes before re-includes keeps the deepest deny in force; the broader parent re-include does not resurrect it.
…hema Read the default branch from the local origin/HEAD symref clone already set, instead of parsing the localized, network-dependent output of git remote show origin. Deserialize the withheld-paths body into a typed struct so a missing or mistyped withheld/reinclude field is a hard error rather than silently becoming an empty list, which would mask a server regression behind a later clone failure.
…or present-check Add two hermetic integration tests for recover_from_arweave that drive the full read path over mocked Arweave GraphQL + IPFS gateways: discover the manifest, fetch it, fetch the envelope, decrypt, and install the withheld blob. One covers an authorized recipient (blob installed), the other a non-recipient (nothing recovered). Both simulate origin death by removing the promisor remote and enable uploadpack.allowFilter so the blob is truly withheld over file://. Also harden the local presence check in recover_from_arweave with GIT_NO_LAZY_FETCH=1 and .output() so the expected 'missing object' case does not trigger a wasted promisor fetch or leak git stderr to the user.
db: parse_recipients now propagates a descriptive error instead of defaulting corrupt recipients JSON to an empty list, which would have denied authorized readers and handed peers incomplete metadata. gl: clone recovery now warns when the sparse-checkout file cannot be read or written, or when the post-recovery checkout fails, instead of silently discarding those errors and claiming files were recovered.
Envelope v2 drops the cleartext recipient public key (kid) from each wrapped-key header entry, so a party holding the public IPFS/Arweave copy can no longer enumerate who is authorized to decrypt a withheld blob. The version is bumped to 2 and v1 envelopes are rejected. open_blob now selects the reader's entry by trial decryption (the AEAD tag authenticates exactly one entry) instead of matching on the public key. Recipient count is still visible; identity is not. Scope is envelope-only, the node DB recipients column and peer replication metadata are unchanged. Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
open_blob fed attacker-controlled envelopes to GenericArray::from_slice and XNonce::from_slice, which panic on a wrong-length input. Validate both the per-recipient box nonce and the body nonce to 24 bytes before use: skip a recipient entry whose nonce is malformed, and return an error for a malformed body nonce, so the public recovery path surfaces an error rather than panicking.
The encrypted-blobs/replicate endpoint shipped every blob's full recipient
DID list to mirrors, which persisted it, so any mirroring peer learned the
reader set. The v2 envelope blinding already removed recipient public keys
from the pinned envelopes, so the comment justifying this (DIDs are already
public) was no longer true.
/replicate now returns {oid, cid} only. Mirrors detect a re-seal by the CID
changing (the OID is stable across re-seals) instead of comparing recipient
sets, and store no recipient identities. Origin-side authz and the at-rest
recipients column are unchanged; this blinds only the peer-facing surface.
The B3 encrypted-blob manifest anchored {oid, cid, recipients} to Arweave, a
permanent public record of every blob's reader set. The v2 envelope blinding
already removed recipient keys from the pinned envelopes, so the comment
justifying this (recipient DIDs are already public) was no longer true. The
manifest now anchors {oid, cid} only. The gl reader already ignores the
recipients field (it recovers by trial decryption), so no reader changes.
…adability
The origin no longer stores recipient DIDs. Migration v5 replaces the
encrypted_blobs.recipients column with an opaque, node-keyed recipients_tag
used only to detect a recipient-set change for re-seal. B1 discovery and fetch
are now gated by the same repo-readability check the git read path uses, not by
per-recipient matching; decryption is gated by the envelope crypto, so a
non-recipient who can read the repo sees a blob's {oid, cid} but cannot open it.
encrypt_and_pin keys the tag from the node seed and returns {oid, cid}; the
Arweave manifest tuple drops the now-unused recipient vec.
A DB compromise no longer reveals the reader set; recovering it would require
brute-forcing candidate DID sets against the keyed tag with the node key.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR implements Option B encrypted withheld blob handling, allowing owners to withhold sensitive blobs while enabling authorized recipients to decrypt and recover them. It spans core cryptographic envelope sealing/opening, visibility-driven blob filtering, filtered Git smart-HTTP responses, encrypted blob persistence with recipients tagging, peer replication with IPFS/Pinata, and client-side sparse clone with multi-stage blob recovery from node APIs and Arweave. ChangesEncrypted withheld blob support
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes This PR introduces a substantial, multi-layered encrypted blob handling system spanning cryptographic primitives, visibility-driven filtering, Git protocol modifications, database schema, replication orchestration, and client recovery logic. The changes are highly interconnected—each layer depends on earlier ones—with dense logic in pack filtering, encryption/pinning orchestration, mirror replication decisions, and manifest-based blob recovery. The breadth (15+ files), heterogeneity (crypto, DB, HTTP APIs, CLI, Git internals), and critical nature (security-sensitive encryption/decryption paths) demand careful review of correctness, error handling, and integration across subsystems. Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/api/repos.rs (1)
327-340:⚠️ Potential issue | 🔴 CriticalGate
service=git-receive-packinfo/refswith the same visibility/RepoNotFound behavior asgit-upload-pack
crates/gitlawb-node/src/server.rsmountsGET /{owner}/{repo}/info/refswithauth::optional_signature, so unauthenticated callers can hit this handler.- In
crates/gitlawb-node/src/api/repos.rs(git_info_refs), thevisibility_check(...)->Decision::Denypath that maps toAppError::RepoNotFound(...)runs only forservice == "git-upload-pack".- For
service == "git-receive-pack", the handler skipsvisibility_check, still acquires the repo (acquire_fresh) and returnssmart_http::info_refs(...), which can leak private-repo existence and advertised refs.Add the same read-visibility gate for
service == "git-receive-pack"and keep mapping deny/unauthorized toRepoNotFoundto avoid an existence oracle and ref disclosure.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 327 - 340, The handler currently only enforces read visibility for service == "git-upload-pack"; update the branch handling of service == "git-receive-pack" in git_info_refs (where service, record, auth, and name are in scope) to call state.db.list_visibility_rules(&record.id).await?, compute caller the same way (auth.as_ref().map(|e| e.0 .0.as_str())), and run visibility_check(&rules, record.is_public, &record.owner_did, caller, "/") and if it returns Decision::Deny return Err(AppError::RepoNotFound(format!("{owner}/{name}"))), mirroring the existing git-upload-pack logic so unauthenticated/unauthorized callers do not observe private repo existence or refs.Source: Learnings
🧹 Nitpick comments (1)
crates/gitlawb-node/src/sync.rs (1)
310-324: 💤 Low valuePinned envelope with mismatched CID is not unpinned.
When
pin_git_objectreturns a CID that differs from the expected one, the code logs a warning and skips recording, but the incorrectly-pinned content remains in IPFS storage. This is a minor storage leak rather than a correctness issue (the DB row is not created, so the next sync will re-attempt), but over many syncs with a misbehaving IPFS node it could accumulate garbage.Consider unpinning the mismatched CID before continuing, or document this as an accepted trade-off.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/sync.rs` around lines 310 - 324, The mismatched CID path currently logs and continues but leaves the wrong content pinned in IPFS; update the match arm handling Ok(cid) if cid != blob.cid to unpin the unexpected CID before continuing (call the appropriate unpin helper in crate::ipfs_pin, e.g., crate::ipfs_pin::unpin_git_object(ipfs_api, &cid) or use ipfs_api directly), then log any unpin error and continue; keep the existing call to db.record_encrypted_blob only in the successful CID-equals branch and ensure references to pin_git_object, blob.oid, blob.cid, db.record_encrypted_blob, ipfs_api, and envelope are used to locate where to add the unpin/cleanup call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-core/src/identity.rs`:
- Around line 55-59: The method seed_bytes() exposes raw 32-byte private seed as
[u8; 32]; change it to follow the existing to_seed() safety pattern by returning
a zeroizing/secret wrapper instead of a plain array (e.g., use the same type
returned by to_seed() or Zeroizing<[u8;32]>), and update call sites that use
seed_bytes() (e.g., where x25519_secret_from_seed and XSecret::from are invoked)
to accept the secret wrapper or to extract the inner bytes temporarily in a
zeroized scope; keep signing_key and to_seed() as the canonical source of seed
material and remove plain [u8;32] returns to ensure automatic zeroization.
In `@crates/gitlawb-node/src/api/encrypted.rs`:
- Around line 40-42: Update the endpoint doc comment for GET
/api/v1/repos/{owner}/{repo}/encrypted-blob/{oid} (function get_encrypted_blob)
to reflect current authorization: state that the handler returns raw envelope
bytes when the repository is readable by the caller (repo read permission),
rather than claiming recipient-based access; keep any note about additional
envelope-level checks if still applicable but remove the outdated "recipient"
wording so clients and future maintainers see the correct contract.
- Around line 75-90: The replicate_encrypted_blobs handler currently calls
state.db.list_all_encrypted_blobs(&record.id) without verifying read access;
before listing blobs, perform the repo-readability gate using the app's
authorization helper (e.g. call a method like
state.ensure_repo_readable(&record) or state.auth.ensure_repo_readable(&owner,
&repo, &record.id)), return an authorization error if the caller cannot read the
repo, and only then call state.db.list_all_encrypted_blobs(&record.id); ensure
the check happens in replicate_encrypted_blobs immediately after get_repo(...)
and before constructing blobs so private repo metadata is never exposed to
unauthorized callers.
In `@crates/gitlawb-node/src/encrypted_pin.rs`:
- Around line 71-74: The loop silently skips when
crate::git::store::read_object(repo_path, oid) returns Err or Ok(None); change
those silent continues to log the failure before continuing: capture the Err
detail and log it (including repo_path and oid) and also log a warning when
Ok(None) (missing blob) so missing encrypted blobs are diagnosable; apply the
same change to the similar branch later (the second read_object check at lines
~82-85) so both read_object failure paths emit contextual logs rather than
silently continuing.
- Around line 61-65: The current code treats errors from
db.encrypted_blob_recipients_tag(repo_id, oid).await as cache-misses and
proceeds to reseal/pin, causing external writes during DB failures; change the
logic to match on the Result explicitly: handle Ok(Some(stored_tag)) -> compare
to tag and continue if equal, Ok(None) -> proceed with reseal, and Err(e) ->
propagate or return the error (do not continue to reseal/pin). Use
encrypted_blob_recipients_tag, repo_id, oid and tag in the match and propagate
the DB error via the function's error return (or use the ? operator) instead of
swallowing it.
In `@crates/gitlawb-node/src/git/visibility_pack.rs`:
- Around line 21-24: The current loop in visibility_pack.rs (the refname check
inside the for (refname, _oid) in refs) only keeps refs under refs/heads/* and
refs/tags/*, which mismatches the broader object enumeration used elsewhere
(smart_http.rs uses rev-list --objects --all and ipfs_pin.rs filters all repo
objects), so update visibility_pack.rs to use the same ref/object enumerator as
the pack/pin paths: remove the restrictive
starts_with("refs/heads/")/starts_with("refs/tags/") checks and instead iterate
the full ref namespace (e.g., all refs under refs/* or use the same
rev-list-based enumeration method), so the withheld set includes any object
reachable from all refs just like blob_paths, smart_http.rs, and ipfs_pin.rs do.
Ensure you reference the same mechanism (function or iterator used for rev-list
traversal) so both packing/pinning and withholding share identical ref/object
enumeration.
- Around line 30-31: The code currently continues on a non-zero `listing.status`
which silently skips a failing `git ls-tree` and can leak data; in the function
in visibility_pack.rs where `listing` is handled (the block checking `if
!listing.status.success()`), replace the `continue` with an error return that
fails closed (e.g., return an Err with context indicating the ref and the
`listing.status` and `listing.stderr`/output). Ensure you propagate a failure up
so callers abort fetch/replication instead of treating missing blobs as public.
In `@crates/gitlawb-node/src/ipfs_pin.rs`:
- Around line 75-86: The cat function currently issues
reqwest::Client::new().post(...).send().await? with no timeout; update cat to
use a bounded request timeout (e.g., via
reqwest::Client::builder().timeout(Duration::from_secs(<N>)).build() and then
.post(...).send().await?) or wrap the send/bytes await in
tokio::time::timeout(Duration::from_secs(<N>), ...).await and convert timeout
errors to anyhow errors; reference the cat function and ipfs_api/cid parameters
when making the change and pick a configurable constant or env-driven value for
the timeout so production worst-case envelope/read latency can be tuned.
---
Outside diff comments:
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 327-340: The handler currently only enforces read visibility for
service == "git-upload-pack"; update the branch handling of service ==
"git-receive-pack" in git_info_refs (where service, record, auth, and name are
in scope) to call state.db.list_visibility_rules(&record.id).await?, compute
caller the same way (auth.as_ref().map(|e| e.0 .0.as_str())), and run
visibility_check(&rules, record.is_public, &record.owner_did, caller, "/") and
if it returns Decision::Deny return
Err(AppError::RepoNotFound(format!("{owner}/{name}"))), mirroring the existing
git-upload-pack logic so unauthenticated/unauthorized callers do not observe
private repo existence or refs.
---
Nitpick comments:
In `@crates/gitlawb-node/src/sync.rs`:
- Around line 310-324: The mismatched CID path currently logs and continues but
leaves the wrong content pinned in IPFS; update the match arm handling Ok(cid)
if cid != blob.cid to unpin the unexpected CID before continuing (call the
appropriate unpin helper in crate::ipfs_pin, e.g.,
crate::ipfs_pin::unpin_git_object(ipfs_api, &cid) or use ipfs_api directly),
then log any unpin error and continue; keep the existing call to
db.record_encrypted_blob only in the successful CID-equals branch and ensure
references to pin_git_object, blob.oid, blob.cid, db.record_encrypted_blob,
ipfs_api, and envelope are used to locate where to add the unpin/cleanup call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 774f892c-6ba4-481c-acdb-4008c3174443
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
.gitignorecrates/gitlawb-core/Cargo.tomlcrates/gitlawb-core/src/encrypt.rscrates/gitlawb-core/src/identity.rscrates/gitlawb-core/src/lib.rscrates/gitlawb-node/Cargo.tomlcrates/gitlawb-node/src/api/encrypted.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/api/visibility.rscrates/gitlawb-node/src/arweave.rscrates/gitlawb-node/src/db/mod.rscrates/gitlawb-node/src/encrypted_pin.rscrates/gitlawb-node/src/git/mod.rscrates/gitlawb-node/src/git/smart_http.rscrates/gitlawb-node/src/git/visibility_pack.rscrates/gitlawb-node/src/ipfs_pin.rscrates/gitlawb-node/src/main.rscrates/gitlawb-node/src/pinata.rscrates/gitlawb-node/src/server.rscrates/gitlawb-node/src/sync.rscrates/gitlawb-node/src/visibility.rscrates/gl/src/clone.rscrates/gl/src/main.rs
| /// The raw 32-byte Ed25519 seed. Used to derive the X25519 secret for | ||
| /// envelope decryption (see `crate::encrypt`). | ||
| pub fn seed_bytes(&self) -> [u8; 32] { | ||
| self.signing_key.to_bytes() | ||
| } |
There was a problem hiding this comment.
Avoid exposing raw seed bytes as a plain return type.
Line 57 returns [u8; 32], which duplicates private key material without zeroization and bypasses the existing to_seed() safety pattern.
Suggested fix
- pub fn seed_bytes(&self) -> [u8; 32] {
- self.signing_key.to_bytes()
- }
+ pub(crate) fn seed_bytes(&self) -> Zeroizing<[u8; 32]> {
+ self.to_seed()
+ }// crates/gitlawb-core/src/encrypt.rs (call site)
let seed = keypair.seed_bytes();
let my_x = XSecret::from(x25519_secret_from_seed(&seed));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-core/src/identity.rs` around lines 55 - 59, The method
seed_bytes() exposes raw 32-byte private seed as [u8; 32]; change it to follow
the existing to_seed() safety pattern by returning a zeroizing/secret wrapper
instead of a plain array (e.g., use the same type returned by to_seed() or
Zeroizing<[u8;32]>), and update call sites that use seed_bytes() (e.g., where
x25519_secret_from_seed and XSecret::from are invoked) to accept the secret
wrapper or to extract the inner bytes temporarily in a zeroized scope; keep
signing_key and to_seed() as the canonical source of seed material and remove
plain [u8;32] returns to ensure automatic zeroization.
There was a problem hiding this comment.
Tracked as #41. This is a pre-existing API (the PR only adds a caller); routing seed access through the existing to_seed() zeroizing wrapper touches other call sites, so handling it separately to keep this PR scoped to recipient-set blinding.
There was a problem hiding this comment.
@beardthelion — understood, thanks for the context and for opening #41. The scope boundary makes sense; the fix is straightforward once that issue is picked up. I'll leave the concern open here as a reference.
There was a problem hiding this comment.
Tracked in #41 to keep this PR scoped to the blinding change. Will route seed access through the zeroizing wrapper there.
| /// Fetch raw bytes for a CID from the local Kubo node (`/api/v0/cat`). | ||
| pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> { | ||
| if ipfs_api.is_empty() { | ||
| return Err(anyhow::anyhow!("IPFS not configured")); | ||
| } | ||
| let url = format!("{}/api/v0/cat?arg={}", ipfs_api.trim_end_matches('/'), cid); | ||
| let resp = reqwest::Client::new().post(&url).send().await?; | ||
| if !resp.status().is_success() { | ||
| return Err(anyhow::anyhow!("ipfs cat {cid}: {}", resp.status())); | ||
| } | ||
| Ok(resp.bytes().await?.to_vec()) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Locate cat implementation =="
ls -la crates/gitlawb-node/src || true
rg -n "pub async fn cat\\(" crates/gitlawb-node/src/ipfs_pin.rs
sed -n '60,110p' crates/gitlawb-node/src/ipfs_pin.rs
echo
echo "== Check for timeouts in this file =="
rg -n "\\.(timeout|connect_timeout|read_timeout|write_timeout)\\(" crates/gitlawb-node/src/ipfs_pin.rs || true
rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/ipfs_pin.rs || true
echo
echo "== Find call sites of cat() =="
rg -n "ipfs_pin::cat\\(|\\bcat\\(ipfs_api, cid\\)|\\bcat\\([^\\)]*ipfs_api" crates/gitlawb-node/src || true
echo
echo "== Search repo for reqwest client timeout usage =="
rg -n "reqwest::Client::new\\(|Client::builder\\(|\\.timeout\\(Duration::|tokio::time::timeout\\(" crates || trueRepository: Gitlawb/node
Length of output: 6214
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Inspect api/encrypted.rs around cat usage =="
sed -n '1,120p' crates/gitlawb-node/src/api/encrypted.rs
echo
rg -n "ipfs_pin::cat\\(|crate::ipfs_pin::cat\\(|cat\\(" crates/gitlawb-node/src/api/encrypted.rs
echo
echo "== Inspect sync.rs around cat usage =="
sed -n '250,360p' crates/gitlawb-node/src/sync.rs
echo
rg -n "ipfs_pin::cat\\(|crate::ipfs_pin::cat\\(" crates/gitlawb-node/src/sync.rs
echo
echo "== Check for tokio::time::timeout usage near these areas =="
rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/sync.rsRepository: Gitlawb/node
Length of output: 8969
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,120p' crates/gitlawb-node/src/ipfs_pin.rsRepository: Gitlawb/node
Length of output: 4233
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "reqwest\s*=" -S Cargo.toml crates -g "Cargo.toml"
rg -n "name = \"reqwest\"" Cargo.lock || true
rg -n "reqwest.*version" Cargo.lock || trueRepository: Gitlawb/node
Length of output: 400
🌐 Web query:
reqwest 0.12 RequestBuilder .timeout(Duration) API
💡 Result:
In reqwest 0.12, the RequestBuilder::timeout(Duration) method is used to configure a timeout for an individual request [1][2]. When this method is called, the provided Duration sets a timeout that begins when the request starts the connection process and ends once the response body has been fully received [1][2]. This per-request timeout overrides any timeout that may have been globally configured on the Client via ClientBuilder::timeout [1][2]. The method takes a std::time::Duration as an argument and returns the modified RequestBuilder, allowing for a fluent builder pattern [1]. This API is available in both the asynchronous reqwest::RequestBuilder and the blocking reqwest::blocking::RequestBuilder [1][2].
Citations:
- 1: https://docs.rs/reqwest/latest/reqwest/struct.RequestBuilder.html
- 2: https://docs.rs/reqwest/latest/reqwest/blocking/struct.RequestBuilder.html
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "tokio::time::timeout\\(" crates/gitlawb-node/src/api/encrypted.rs crates/gitlawb-node/src/sync.rs || true
rg -n "ipfs_pin::cat\\(" crates/gitlawb-node/src/api/encrypted.rs crates/gitlawb-node/src/sync.rsRepository: Gitlawb/node
Length of output: 282
Add a bounded timeout to IPFS cat requests
crates/gitlawb-node/src/ipfs_pin.rs’s cat does a reqwest::Client::new().post(...).send().await? without any explicit request timeout, and the ipfs_pin::cat(...) call sites aren’t wrapped in a tokio::time::timeout(...), so a stalled Kubo can hang the handler indefinitely.
Suggested patch
+use std::time::Duration;
+
pub async fn cat(ipfs_api: &str, cid: &str) -> Result<Vec<u8>> {
@@
- let resp = reqwest::Client::new().post(&url).send().await?;
+ let resp = reqwest::Client::new()
+ .post(&url)
+ .timeout(Duration::from_secs(15))
+ .send()
+ .await?;Set the timeout value to match the worst-case envelope size / read latency expected in production.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/gitlawb-node/src/ipfs_pin.rs` around lines 75 - 86, The cat function
currently issues reqwest::Client::new().post(...).send().await? with no timeout;
update cat to use a bounded request timeout (e.g., via
reqwest::Client::builder().timeout(Duration::from_secs(<N>)).build() and then
.post(...).send().await?) or wrap the send/bytes await in
tokio::time::timeout(Duration::from_secs(<N>), ...).await and convert timeout
errors to anyhow errors; reference the cat function and ipfs_api/cid parameters
when making the change and pick a configurable constant or env-driven value for
the timeout so production worst-case envelope/read latency can be tuned.
There was a problem hiding this comment.
Pre-existing pinning code, not touched by this PR; out of scope for the recipient-set blinding change here.
There was a problem hiding this comment.
@beardthelion, the cat function is actually new code introduced in this PR — it appears in the diff with + lines and does not exist in the base branch. So the missing timeout does fall within the scope of this change.
That said, if you'd prefer to keep this PR focused on the recipient-set blinding and address the timeout in a follow-up, that's a reasonable call. Would you like me to open a GitHub issue to track adding a bounded timeout to ipfs_pin::cat?
There was a problem hiding this comment.
You're right, I was wrong to call this pre-existing: cat() is new in this stack (introduced in the B1 fetch work, 74b7a40), so the missing timeout is in scope for the encrypted-replication change. Valid robustness issue; will address it in that lane rather than here.
There was a problem hiding this comment.
Tracked in #45 (bounded timeout for ipfs_pin::cat). Handling it there rather than in this PR.
…rror
Address review on the at-rest blinding change:
- The encrypted-blobs/replicate listing returned {oid, cid} with no visibility
check, so a non-readable repo's blob index was reachable by an unauthenticated
caller who guessed {owner}/{repo}. Gate it by the same repo-readability check
discovery and fetch use. For the intended case (a public repo with withheld
subtrees) the public root keeps this open to peers; only fully non-readable
repos are withheld, which is the desired behavior.
- encrypt_and_pin treated a recipients_tag DB read error as a cache miss and
resealed, causing avoidable IPFS writes during a partial outage; skip and retry
on the next push instead.
- Correct the get_encrypted_blob doc comment to describe repo-readability access.
Completes recipient-set blinding for encrypted replication: the reader set of a withheld blob is no longer stored or published anywhere in cleartext, only in the encrypted envelope key-wraps.
This stacks on #36 (encrypted replication B1/B2/B3, which already blinds the envelope, the peer-replication wire, and the Arweave manifest). Since the upstream repo only has
main, the diff here carries that stack as prerequisites; the new work in this PR is the at-rest change.What changed
encrypted_blobs.recipientscolumn (a cleartext DID list) is replaced (migration v5) with an opaque, node-keyedrecipients_tag(HMAC-SHA256 over the sorted DID set). The tag is used only to detect a recipient-set change so an unchanged blob is not re-sealed; it is never the DID list./encrypted-blobs) and fetch (/encrypted-blob/{oid}) are now authorized by repo readability (the samevisibility_checkthe git read path uses) rather than per-recipient matching. Decryption is gated by the envelope crypto: only a real recipient can open an envelope.encrypt_and_pinkeys the tag from the node seed and returns{oid, cid}; the Arweave manifest tuple drops the now-unused recipient vec; the mirror path stores no recipient data.Why
An origin that enforces per-recipient authorization must be able to test whether a caller is a recipient, which means it can always recover the reader set. Moving authorization onto repo readability plus envelope crypto takes the origin out of that loop, so a DB compromise no longer reveals who can read a blob.
Accepted trade-offs
{oid, cid}). The OID is already in the tree they clone and the envelope bytes are already public on IPFS, so no new secret is exposed.Tests
recipients_tagis covered by unit tests (order-insensitive, set-sensitive, node-keyed). The handler authorization change is wiring around the existing, testedvisibility_check.Summary by CodeRabbit
New Features
Chores