fix(pathfinder): make find_nvidia_binary_utility deterministic, never search CWD#2196
fix(pathfinder): make find_nvidia_binary_utility deterministic, never search CWD#2196aryanputta wants to merge 3 commits into
Conversation
… search CWD find_nvidia_binary_utility assembled a bounded list of trusted directories (NVIDIA wheel bin/, CONDA_PREFIX, CUDA_HOME/CUDA_PATH) and then delegated to shutil.which(name, path=trusted_dirs). On Windows shutil.which prepends the process current working directory to the search even when an explicit path= is supplied, so a binary located in an arbitrary (possibly attacker-writable) CWD could be returned in preference to the trusted CUDA / Conda / wheel binary. That violates the pathfinder contract of a deterministic lookup over a documented, bounded set of trusted roots. Replace the shutil.which delegation with an explicit resolver that searches only the trusted directories, in order, returning the first executable match. The current working directory and ambient PATH are never consulted. POSIX execute-bit (X_OK) and Windows extension semantics are preserved, so behavior is unchanged except for removing the CWD/PATH leakage. Names resolved in the existing trusted dirs return exactly as before. Rewrites the search-path tests to assert the deterministic probe order and adds TestResolveInTrustedDirs covering CWD isolation, first-match-wins, empty/duplicate dir skipping, and POSIX non-executable rejection. Fixes NVIDIA#2119
|
Hi @aryanputta, this is great, but in isolation it'll probably break some users. This was pointed out by @kkraus14 here before:
I ran the context and my thoughts by GPT around the same time, which lead to the conclusion at the end of this comment:
I still think that's the right compromise:
The behavior difference is significant enough though that I'd want to bump the pathfinder minor version, which we can do; I don't have concerns about that. How do you feel about expanding this PR to fold in the CTK-root canary fallback? — I believe the code changes will still be quite modest. |
…utility After the deterministic search over the explicit trusted directories (NVIDIA wheel bin/, CONDA_PREFIX, CUDA_HOME/CUDA_PATH) misses, fall back to a CTK-root canary probe: resolve cudart through the OS dynamic loader, which honors LD_LIBRARY_PATH on Linux and the native DLL search on Windows, derive the CUDA Toolkit root from its absolute path, and search that root's bin layout. This addresses the concern raised on NVIDIA#2196: users who follow the CUDA Linux installation guide set LD_LIBRARY_PATH for libraries and PATH for executables. The bounded finder alone would stop finding the utility for them because PATH is intentionally never consulted. The canary fallback recovers that case through LD_LIBRARY_PATH instead of PATH. LD_LIBRARY_PATH is still an attack vector, but a significantly harder one to exploit than PATH, and the ambient PATH and process CWD remain unused. The canary runs only after the explicit trusted dirs miss, so the common wheel/conda/CUDA_HOME cases never spawn the resolver subprocess. The canary -> CTK-root resolution is factored into a shared resolve_ctk_root_via_canary helper reused by the dynamic-library CTK-root canary flow. Adds tests for the fallback (found, ordering, Windows bin layout, not consulted when found earlier, cached) and for resolve_ctk_root_via_canary. Adds 1.6.0 release notes for the minor version bump.
|
Thanks @rwgk, that compromise makes sense and I've folded it in (pushed What the new commit adds Search order is unchanged for the explicit trusted dirs, with the canary as a strict fallback:
Key points:
Version Added Tests Added coverage for the fallback in Local note: |
pre-commit.ci mypy flagged returning Any from resolve_ctk_root_via_canary and _resolve_ctk_root_via_canary (both declared -> str | None), because derive_ctk_root resolves to Any under the pathfinder mypy config. Bind the result to an annotated local before returning, matching the pattern used elsewhere in the package.
Summary
Fixes #2119.
find_nvidia_binary_utilityassembles a bounded list of trusted directories (NVIDIA wheelbin/,CONDA_PREFIX,CUDA_HOME/CUDA_PATH) and then delegated toshutil.which(name, path=trusted_dirs). On Windows, CPython'sshutil.whichprepends the process current working directory to the search even when an explicitpath=is supplied. As a result a binary located in an arbitrary (possibly attacker-writable) CWD could be returned in preference to the trusted CUDA / Conda / wheel binary, violating the pathfinder contract of a deterministic lookup over a documented, bounded set of trusted roots.Fix
Replace the
shutil.whichdelegation with an explicit resolver (_resolve_in_trusted_dirs) that searches only the trusted directories, in order, and returns the first executable match. The current working directory and ambientPATHare never consulted.X_OK) and Windows extension semantics are preserved via_is_executable_file, so behavior is unchanged apart from removing the CWD/PATH leakage.Tests
TestResolveInTrustedDirs: CWD isolation (the core regression), first-matching-dir-wins, empty/duplicate dir skipping, and POSIX non-executable rejection.Local checks:
ruff checkandruff format --checkpass on both files; the resolver logic and CWD-isolation behavior were verified on a real filesystem (this package does not import on macOS due to the Linux-onlydlinfobinding, so the mocker-based suite runs in CI on Linux/Windows).