Skip to content

EXP-22322: Support GH_USER/GH_TOKEN env vars to clone GitHub subrepos over HTTPS#91

Open
Piranja wants to merge 1 commit into
mainfrom
task/EXP-22322-PAT-token
Open

EXP-22322: Support GH_USER/GH_TOKEN env vars to clone GitHub subrepos over HTTPS#91
Piranja wants to merge 1 commit into
mainfrom
task/EXP-22322-PAT-token

Conversation

@Piranja
Copy link
Copy Markdown
Contributor

@Piranja Piranja commented May 29, 2026

Ticket link

EXP-22322

PR description

  • What is the context for this PR?

In CI we use a GitHub PAT that only works over HTTPS, but .s7substate
lists every subrepo with an SSH URL (git@github.com:readdle/...)
because the same file is used locally with developers' personal SSH
keys. We do not want to rewrite .s7substate.

This PR adds an auto-activated auth mode: when both GH_USER and
GH_TOKEN are set in the environment, s7 transparently uses HTTPS+PAT
for every git network operation, while leaving .s7substate and the
cloned subrepos' .git/config untouched. Works recursively through
nested s7 subrepos (e.g., RDPDFKit → TesseractOCR) since they inherit
the env vars.

  • Why did you take the approach you did?

Implementation uses git's own url.<base>.insteadOf config injected
via -c flags on every git invocation. Concretely, s7 prepends:

-c url.https://USER:TOKEN@github.com/.insteadOf=git@github.com:
-c url.https://USER:TOKEN@github.com/.insteadOf=ssh://git@github.com/

at the single chokepoint
+[GitRepository runGitWithArguments:stdOutOutput:stdErrOutput:currentDirectoryPath:]
in Git.m. Both SSH URL shapes (git@github.com:… and
ssh://git@github.com/…) are covered. Git applies the rewrite only at
network-resolution time — the URL stored in .git/config after clone
stays as the original SSH form, so the token never lands on disk and
existing URL-comparison logic (S7PostCheckoutHook.m:checkSubrepoUrlChanged)
is unaffected.

Why centralize at the chokepoint rather than rewrite in .s7substate
or at the clone call sites:

  • .s7substate must keep SSH URLs for local SSH-key users.
  • Every git command (clone/fetch/push/pull) funnels through
    +runGitWithArguments:..., so one injection covers everything.
  • +executeCommand: (used only for git init and local
    git merge --no-edit) doesn't touch remote URLs, so it doesn't need
    the rewrite.

Other design choices worth flagging:

  • Activation gate is encapsulated in +envGitHubTokenAuthEnabled,
    currently auto-on when both env vars are set. Swapping to require an
    explicit S7_USE_GH_TOKEN=1 opt-in is a one-line change there.
  • Host scope is github.com only — matches every URL in our
    .s7substate (and the nested RDPDFKit .s7substate) and the GH_*
    naming.
  • Userinfo percent-encoding for GH_USER / GH_TOKEN so tokens
    with /, @, :, %, etc. don't break URL parsing.
  • Trace masking: when S7_TRACE_GIT=1, the token is replaced with
    *** in the s7-printed command line. A one-line startup banner
    s7: GitHub token auth: enabled=YES|NO makes it easy to confirm the
    gate from CI logs.

Tests:

  • New unit-test file system7-tests/gitGitHubTokenAuthTests.m (12 cases)
    covering the argv builder, percent-encoding of special chars in both
    user and token, and trace-line masking.
  • New integration case
    system7-tests/integration/case-cloneViaGitHubTokenAuth.sh
    exercises the env-unset / env-set paths end-to-end via
    S7_TRACE_GIT=1, verifies the token never appears in the cloned
    subrepo's .git/, and confirms non-github URLs are untouched.
  • All existing unit tests pass. All integration cases pass except the
    pre-existing case-cloneS7andLFSmixedRepo.sh, which fails on the
    same git-lfs hook-template drift seen before this branch (unrelated
    to PAT auth).

@Piranja Piranja requested a review from lyeskin May 29, 2026 14:12
Copy link
Copy Markdown
Contributor

@kvoskovskiy kvoskovskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PAT auth change. The approach is solid — centralizing at the single runGitWithArguments: chokepoint, using insteadOf so the token never lands in .git/config, and the pure-builder unit tests are genuinely good. My findings below all concern the token's blast radius beyond .git/config (the process table and git's own output), which the current masking only partially covers. Neither blocks the core approach.

🔴 High — token exposed in the process argument list

The creds are passed as -c url.https://USER:TOKEN@github.com/... args, so anything that can read the process table (ps -ef, /proc/<pid>/cmdline, monitoring agents, crash reporters) sees the token in cleartext on every git call. On a shared CI runner this is a real leak vector. Git ≥ 2.31 lets you pass the same config via env, which is not visible in ps:

GIT_CONFIG_COUNT=2
GIT_CONFIG_KEY_0=url.https://USER:TOKEN@github.com/.insteadOf   GIT_CONFIG_VALUE_0=git@github.com:
GIT_CONFIG_KEY_1=url.https://USER:TOKEN@github.com/.insteadOf   GIT_CONFIG_VALUE_1=ssh://git@github.com/

Same behavior, same clean-.git/config property, but the token stays out of argv. Worth confirming the CI git is ≥ 2.31 (almost certainly yes).

🟡 Medium — masking covers the command line but not git's captured output

maskedTraceLineForArguments: masks the command we print, but git's stdout/stderr is streamed verbatim to the trace and accumulated into errorData/outputData returned to callers. With insteadOf the rewritten URL contains the token; modern git redacts the password in most unable to access messages, but GIT_CURL_VERBOSE/GIT_TRACE and some transport error paths do not. So the more likely place a token surfaces on a failed clone is the unmasked path. Cheapest fix: run maskedTraceLine:forToken: over the accumulated *ppStdErrOutput before tracing/returning.

🟢 Minor (no change required)

  • HTTPS subrepo URLs aren't rewritten — only git@github.com: and ssh://git@github.com/. A future/nested https://github.com/... entry in .s7substate would silently bypass the PAT. You've verified all current URLs are SSH; maybe note the assumption in a comment.
  • dispatch_once env caching is correct here — each s7 / nested-hook invocation is a fresh process, so no staleness across recursive subrepos. 👍
  • Test gap: clone is covered end-to-end, but no assertion that the rewrite applies to fetch/push (the ongoing-op path on an already-cloned subrepo). Low value given the single chokepoint, but a one-liner would lock in the "every network op" claim.

Comment thread system7/git/Git.m
{
NSArray<NSString *> *const credentialArguments = [self gitHubTokenInsteadOfArguments];
NSArray<NSString *> *const finalArguments = credentialArguments.count > 0
? [credentialArguments arrayByAddingObjectsFromArray:arguments]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Token in the process argument list. These -c url.…:TOKEN@github.com/... args are visible via ps//proc/<pid>/cmdline to anything on the runner. Consider passing the same insteadOf config through GIT_CONFIG_COUNT + GIT_CONFIG_KEY_n/GIT_CONFIG_VALUE_n env vars (git ≥ 2.31) on task.environment instead — identical behavior, but keeps the token out of the process table. The trace-masking handles the s7-printed line, not ps.

Comment thread system7/git/Git.m
}

s7TraceGit(@"s7: git %@\n", [task.arguments componentsJoinedByString:@" "]);
s7TraceGit(@"s7: git %@\n", [self maskedTraceLineForArguments:task.arguments]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This masks the command line, but git's own stdout/stderr is streamed unmasked in the readability handler at L402 and accumulated into errorData/outputData returned to callers. With insteadOf, a failed clone / verbose transport output can echo the rewritten URL incl. the token. Consider also masking the accumulated *ppStdErrOutput via maskedTraceLine:forToken: so the masking isn't limited to the command line.

Comment thread system7/git/Git.m
[self urlEscapeUserinfo:token]];
return @[
@"-c", [NSString stringWithFormat:@"url.%@.insteadOf=git@github.com:", httpsBase],
@"-c", [NSString stringWithFormat:@"url.%@.insteadOf=ssh://git@github.com/", httpsBase],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Only the two SSH URL shapes are rewritten — a https://github.com/... entry (e.g. in a future or nested .s7substate) would silently bypass the PAT. Fine given all current URLs are SSH; worth a one-line comment stating that assumption so a future https entry isn't a surprise.

Copy link
Copy Markdown
Contributor

@kvoskovskiy kvoskovskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants