EXP-22322: Support GH_USER/GH_TOKEN env vars to clone GitHub subrepos over HTTPS#91
EXP-22322: Support GH_USER/GH_TOKEN env vars to clone GitHub subrepos over HTTPS#91Piranja wants to merge 1 commit into
Conversation
kvoskovskiy
left a comment
There was a problem hiding this comment.
Reviewed the PAT auth change. The approach is solid — centralizing at the single runGitWithArguments: chokepoint, using insteadOf so the token never lands in .git/config, and the pure-builder unit tests are genuinely good. My findings below all concern the token's blast radius beyond .git/config (the process table and git's own output), which the current masking only partially covers. Neither blocks the core approach.
🔴 High — token exposed in the process argument list
The creds are passed as -c url.https://USER:TOKEN@github.com/... args, so anything that can read the process table (ps -ef, /proc/<pid>/cmdline, monitoring agents, crash reporters) sees the token in cleartext on every git call. On a shared CI runner this is a real leak vector. Git ≥ 2.31 lets you pass the same config via env, which is not visible in ps:
GIT_CONFIG_COUNT=2
GIT_CONFIG_KEY_0=url.https://USER:TOKEN@github.com/.insteadOf GIT_CONFIG_VALUE_0=git@github.com:
GIT_CONFIG_KEY_1=url.https://USER:TOKEN@github.com/.insteadOf GIT_CONFIG_VALUE_1=ssh://git@github.com/
Same behavior, same clean-.git/config property, but the token stays out of argv. Worth confirming the CI git is ≥ 2.31 (almost certainly yes).
🟡 Medium — masking covers the command line but not git's captured output
maskedTraceLineForArguments: masks the command we print, but git's stdout/stderr is streamed verbatim to the trace and accumulated into errorData/outputData returned to callers. With insteadOf the rewritten URL contains the token; modern git redacts the password in most unable to access messages, but GIT_CURL_VERBOSE/GIT_TRACE and some transport error paths do not. So the more likely place a token surfaces on a failed clone is the unmasked path. Cheapest fix: run maskedTraceLine:forToken: over the accumulated *ppStdErrOutput before tracing/returning.
🟢 Minor (no change required)
- HTTPS subrepo URLs aren't rewritten — only
git@github.com:andssh://git@github.com/. A future/nestedhttps://github.com/...entry in.s7substatewould silently bypass the PAT. You've verified all current URLs are SSH; maybe note the assumption in a comment. dispatch_onceenv caching is correct here — each s7 / nested-hook invocation is a fresh process, so no staleness across recursive subrepos. 👍- Test gap: clone is covered end-to-end, but no assertion that the rewrite applies to
fetch/push(the ongoing-op path on an already-cloned subrepo). Low value given the single chokepoint, but a one-liner would lock in the "every network op" claim.
| { | ||
| NSArray<NSString *> *const credentialArguments = [self gitHubTokenInsteadOfArguments]; | ||
| NSArray<NSString *> *const finalArguments = credentialArguments.count > 0 | ||
| ? [credentialArguments arrayByAddingObjectsFromArray:arguments] |
There was a problem hiding this comment.
🔴 Token in the process argument list. These -c url.…:TOKEN@github.com/... args are visible via ps//proc/<pid>/cmdline to anything on the runner. Consider passing the same insteadOf config through GIT_CONFIG_COUNT + GIT_CONFIG_KEY_n/GIT_CONFIG_VALUE_n env vars (git ≥ 2.31) on task.environment instead — identical behavior, but keeps the token out of the process table. The trace-masking handles the s7-printed line, not ps.
| } | ||
|
|
||
| s7TraceGit(@"s7: git %@\n", [task.arguments componentsJoinedByString:@" "]); | ||
| s7TraceGit(@"s7: git %@\n", [self maskedTraceLineForArguments:task.arguments]); |
There was a problem hiding this comment.
🟡 This masks the command line, but git's own stdout/stderr is streamed unmasked in the readability handler at L402 and accumulated into errorData/outputData returned to callers. With insteadOf, a failed clone / verbose transport output can echo the rewritten URL incl. the token. Consider also masking the accumulated *ppStdErrOutput via maskedTraceLine:forToken: so the masking isn't limited to the command line.
| [self urlEscapeUserinfo:token]]; | ||
| return @[ | ||
| @"-c", [NSString stringWithFormat:@"url.%@.insteadOf=git@github.com:", httpsBase], | ||
| @"-c", [NSString stringWithFormat:@"url.%@.insteadOf=ssh://git@github.com/", httpsBase], |
There was a problem hiding this comment.
🟢 Only the two SSH URL shapes are rewritten — a https://github.com/... entry (e.g. in a future or nested .s7substate) would silently bypass the PAT. Fine given all current URLs are SSH; worth a one-line comment stating that assumption so a future https entry isn't a surprise.
kvoskovskiy
left a comment
There was a problem hiding this comment.
Thank you for the changes!
Ticket link
EXP-22322
PR description
In CI we use a GitHub PAT that only works over HTTPS, but
.s7substatelists every subrepo with an SSH URL (
git@github.com:readdle/...)because the same file is used locally with developers' personal SSH
keys. We do not want to rewrite
.s7substate.This PR adds an auto-activated auth mode: when both
GH_USERandGH_TOKENare set in the environment, s7 transparently uses HTTPS+PATfor every git network operation, while leaving
.s7substateand thecloned subrepos'
.git/configuntouched. Works recursively throughnested s7 subrepos (e.g., RDPDFKit → TesseractOCR) since they inherit
the env vars.
Implementation uses git's own
url.<base>.insteadOfconfig injectedvia
-cflags on every git invocation. Concretely, s7 prepends:-c url.https://USER:TOKEN@github.com/.insteadOf=git@github.com:
-c url.https://USER:TOKEN@github.com/.insteadOf=ssh://git@github.com/
at the single chokepoint
+[GitRepository runGitWithArguments:stdOutOutput:stdErrOutput:currentDirectoryPath:]in
Git.m. Both SSH URL shapes (git@github.com:…andssh://git@github.com/…) are covered. Git applies the rewrite only atnetwork-resolution time — the URL stored in
.git/configafterclonestays as the original SSH form, so the token never lands on disk and
existing URL-comparison logic (
S7PostCheckoutHook.m:checkSubrepoUrlChanged)is unaffected.
Why centralize at the chokepoint rather than rewrite in
.s7substateor at the clone call sites:
.s7substatemust keep SSH URLs for local SSH-key users.+runGitWithArguments:..., so one injection covers everything.+executeCommand:(used only forgit initand localgit merge --no-edit) doesn't touch remote URLs, so it doesn't needthe rewrite.
Other design choices worth flagging:
+envGitHubTokenAuthEnabled,currently auto-on when both env vars are set. Swapping to require an
explicit
S7_USE_GH_TOKEN=1opt-in is a one-line change there..s7substate(and the nested RDPDFKit.s7substate) and theGH_*naming.
GH_USER/GH_TOKENso tokenswith
/,@,:,%, etc. don't break URL parsing.S7_TRACE_GIT=1, the token is replaced with***in the s7-printed command line. A one-line startup banners7: GitHub token auth: enabled=YES|NOmakes it easy to confirm thegate from CI logs.
Tests:
system7-tests/gitGitHubTokenAuthTests.m(12 cases)covering the argv builder, percent-encoding of special chars in both
user and token, and trace-line masking.
system7-tests/integration/case-cloneViaGitHubTokenAuth.shexercises the env-unset / env-set paths end-to-end via
S7_TRACE_GIT=1, verifies the token never appears in the clonedsubrepo's
.git/, and confirms non-github URLs are untouched.pre-existing
case-cloneS7andLFSmixedRepo.sh, which fails on thesame git-lfs hook-template drift seen before this branch (unrelated
to PAT auth).