Skip to content

windows support omnidrive#321

Open
3a1b2c3 wants to merge 2 commits into
NVIDIA:mainfrom
3a1b2c3:windows
Open

windows support omnidrive#321
3a1b2c3 wants to merge 2 commits into
NVIDIA:mainfrom
3a1b2c3:windows

Conversation

@3a1b2c3

@3a1b2c3 3a1b2c3 commented Jun 10, 2026

Copy link
Copy Markdown

No description provided.

@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Windows support to the omnidrive integration across three modules: optional fcntl import with entry-point guards in wheel_profiles.py, OS-specific compiler/linker flag branches in _plugin.py, and a defensive getattr guard in rasterizer.py's cleanup().

  • wheel_profiles.py: fcntl is now imported inside a try/except ImportError; read_evdev_name and scan_evdev_devices return safe sentinels (None / ()) when fcntl is absent, and a new test validates both paths.
  • _plugin.py: Compiler options are split into POSIX and Windows branches (MSVC rejects -Wall/-Werror); Windows linker flags add cuda.lib plus an optional /LIBPATH: derived from CUDA_HOME/CUDA_PATH; the dead lib_dir variable is removed.
  • rasterizer.py: cleanup() uses getattr(self, "_temp_dir", None) so a failed __init__ (e.g. CUDA extension build error) no longer masks the real exception with an AttributeError from __del__.

Confidence Score: 5/5

Safe to merge — the changes are well-scoped Windows stubs and defensive guards that don't alter existing POSIX behavior.

All three changed modules have correct OS guards: the fcntl optional-import pattern is sound, the entry-point short-circuits prevent any unreachable ioctl path from being accidentally exercised, the compiler/linker flag split mirrors the existing POSIX structure, and the getattr cleanup fix is a standard defensive pattern. A new test covers the Windows fcntl-absent code paths. No existing logic is modified.

The Windows linker path in _plugin.py silently omits /LIBPATH: when neither CUDA_HOME nor CUDA_PATH is set — noted in a prior thread — but no files require blocking attention for this change.

Important Files Changed

Filename Overview
integrations/omnidreams/ludus-renderer/ludus_renderer/_ops/_plugin.py Restructures compiler/linker options into OS-specific branches for Windows support; removes the dead lib_dir variable; adds CUDA driver lib linkage via env-var-derived /LIBPATH.
integrations/omnidreams/omnidreams/interactive_drive/input/wheel_profiles.py Makes fcntl import optional via try/except; adds if fcntl is None short-circuit guards on read_evdev_name and scan_evdev_devices so the module loads and behaves safely on Windows.
integrations/omnidreams/omnidreams/interactive_drive/rasterizer.py Defensive getattr guard in cleanup() prevents AttributeError masking real __init__ failures when _temp_dir was never assigned.
integrations/omnidreams/tests/interactive_drive/test_wheel_profiles.py Adds test_evdev_helpers_without_fcntl to verify both public entry points return safe sentinels when fcntl is patched to None, covering the Windows stub behaviour.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_get_plugin called] --> B{os.name == 'nt'?}
    B -- Yes --> C[find_cl_path / ensure cl.exe in PATH]
    C --> D[cc_opts = MSVC flags /wd4067 /wd4624]
    D --> E[cuda_opts = nvcc flags no -Xcompiler -Wall]
    E --> F[_cuda_home from CUDA_HOME or CUDA_PATH]
    F --> G{_cuda_home set?}
    G -- Yes --> H[ldflags = cuda.lib + /LIBPATH]
    G -- No --> I[ldflags = cuda.lib only]
    B -- No --> J[cc_opts = GCC/Clang -Wall -Werror -Wno-psabi]
    J --> K[cuda_opts with -Xcompiler -Wall,-Werror,-Wno-psabi]
    K --> L[ldflags = -lcuda]
    H --> M[cpp_extension.load]
    I --> M
    L --> M
Loading

Reviews (3): Last reviewed commit: "Address PR #321 review: Windows fcntl gu..." | Re-trigger Greptile

Comment on lines +99 to +105
elif os.name == "nt":
# Link the CUDA driver API (cu* functions). MSVC needs cuda.lib, which
# lives in $CUDA_HOME\lib\x64 (not auto-added like the runtime libs).
_cuda_home = os.environ.get("CUDA_HOME") or os.environ.get("CUDA_PATH") or ""
ldflags = ["cuda.lib"]
if _cuda_home:
ldflags.append("/LIBPATH:" + os.path.join(_cuda_home, "lib", "x64"))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 cuda.lib added without a search path when env vars are unset

When both CUDA_HOME and CUDA_PATH are absent, ldflags is set to just ["cuda.lib"] with no /LIBPATH: directive. The comment notes that cuda.lib is "not auto-added like the runtime libs", so the MSVC linker will only find it if it happens to be in a default search path (e.g. via the Visual Studio CUDA integration). In a CI/container environment without VS integration this silently produces a link error. Emitting a warning or raising an error when _cuda_home is empty would make the failure actionable rather than cryptic.

@3a1b2c3 3a1b2c3 changed the title windows support windows support omnidrive Jun 10, 2026
@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

/ok to test fcfe4e4

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

Thank you for your contribution !

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

_ No description provided. _

Please add a concise MR description , including which Windows version and HW tested

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

assisted with claude

Review — PR #321 "windows support omnidrive"

Verdict: LGTM minor suggestions. I checked out fcfe4e44 and tested it end-to-end natively on Windows 11 (RTX 5090 32 GB, driver 591.86, CUDA toolkit 13.1, MSVC 2022 Build Tools 14.44, Python 3.12, torch 2.11.0+cu130 from the pytorch-cu130 index already pinned for win32 in flashdreams/pyproject.toml). Summary of results:

  • uv sync --package flashdreams-omnidreams --extra interactive-drive resolves and installs cleanly on native Windows (no Linux-only wheels in the path; transformer-engine correctly excluded by marker, triton-windows picked up).
  • The Ludus renderer extension compiles and links first-try with this PR's flags — MSVC 19.44 + nvcc 13.1, all 11 build steps, loads as ludus_renderer_plugin.pyd. Without the PR, cl.exe rejects the GCC -Werror/-Xcompiler flags exactly as the new comment says, and the driver-API symbols don't link.
  • The full interactive-drive demo runs natively: --stream-mjpeg presenter, scene staging cache, Cosmos-Reason1-7B text encoder with --offload-text-encoder, DiT pipeline, browser driving. Benchmark below.

Per-file notes

ludus_renderer/_ops/_plugin.py — looks right; the greptile linker comment is covered by torch itself

Re: the bot's "cuda.lib without a search path" concern — in practice this is already double-covered:

  1. torch.utils.cpp_extension with with_cuda=True appends its own /LIBPATH:<cuda_home>\lib\x64 on Windows (cpp_extension.py:2497 in torch 2.11), where cuda_home is resolved via CUDA_HOMECUDA_PATHnvcc on PATH → the default install glob — and it raises a clear "CUDA_HOME not found" error if all of those fail, before any silent link error can happen.
  2. The CUDA Windows installer always sets CUDA_PATH, so the PR's explicit /LIBPATH fires in any normal install.

I verified this on my link line — the PR's /LIBPATH and torch's identical one appear side by side. So the explicit /LIBPATH here is harmless belt-and-suspeder and no warning is strictly needed; if you want one anyway, a logger.debug when _cuda_home is empty would do. Two small suggestions:

  • A one-line comment that paths containing spaces (C:\Program Files\...) survive because torch's ninja writer quotes ldflags (_nt_quote_args) — it looks fragile otherwise.
  • The NT branch drops warnings-as-errors entirely (POSIX keeps -Wall -Werror; NT has only the two /wd suppressions). Fine as a bootstrap, but consider a follow-up to add /W3 /WX (or a TODO) so the Windows build doesn't permanently drift to warnings-ignored. FWIW the build is warning-clean today except a benign LNK4098 (LIBCMT vs /MD).
  • Optional drive-by: lib_dir at the top of the NT block is computed but never used (pre-existing).

wheel_profiles.py — the greptile AttributeError finding is real-but-unreachable; the comment is what needs fixing

I traced all five fcntl.ioctl() call sites: every one is gated behind a successful open()/os.open() of a /dev/input/* evdev node, and scan_evdev_devices() enumerates nothing on Windows (/dev/input/by-id fails is_dir(), the event* glob is empty). The FFB backends additionally bail on _fd is None. So guarding all five sites is over-prescribed — but the bot is right that the new comment ("let the call sites no-op") is inaccurate: if one were reached it would raise AttributeError, which notably is not caught by the surrounding except OSError handlers.

Cheapest honest fix: early-return in the two public entry points (read_evdev_name()None, scan_evdev_devices()()) when fcntl is None, and reword the comment to say the paths are unreachable on Windows rather than no-ops. A tiny test would lock it in (none of this module is covered today):

def test_evdev_helpers_without_fcntl(monkeypatch, tmp_path):
    monkeypatch.setattr(wheel_profiles, "fcntl", None)
    assert wheel_profiles.scan_evdev_devices() == ()
    assert wheel_profiles.read_evdev_name(tmp_path / "event0") is None

rasterizer.py — good catch

The getattr guard is correct, and worth keeping even apart from Windows: pre-PR, a failed extension build raised in __init__ before _temp_dir was assigned, and __del__cleanup()'s AttributeError masked the real build error. (Equivalent alternative: assign self._temp_dir = None first thing in __init__; the chosen fix is fine and self-documenting.)

Native 5090 benchmark (interactive-drive --stream-mjpeg --offload-text-encoder)

Steady-state wall_present_fps from INTERACTIVE_DRIVE_PROFILE_INPUT_TO_PRESENT=1, same scene (clipgt-0d404ff7), same GPU. WSL2 numbers are from prior runs on the identical machine for comparison:

Config Native Windows (this PR) WSL2 (same rig)
Default manifest (1280×704, 2 steps) 7.3 FPS (8-frame chunk ≈ 1035 ms, model 1007 ms; peak 21.9 GiB) ~6.8 FPS
Perf manifest w/ native_dit_acceleration: disabled (1168×640, compile_net) 13.4 FPS (chunk ≈ 539 ms; peak 20.0 GiB)
Perf manifest, full native FP8/cuDNN DiT does not build on Windows (see below) ~15–20 FPS

Native Windows matches (slightly beats) WSL2 at identical config, fits comfortably in 32 GB with --offload-text-encoder, and the browser drive loop works end to end.

Remaining gap to Linux is entirely the omnidreams_singleview native extension, which is out of this PR's scope but worth tracking as follow-ups (happy to file issues):

  1. sync_thirdparty.py fails on Windows checkouts with core.autocrlf=true: the *.patch files get CRLF-translated and git apply reports "corrupt patch". A .gitattributes with *.patch -text (repo has none today) fixes it at the root; I worked around it with GIT_CONFIG_* env overrides.
  2. omnidreams/native/omnidreams_singleview.py cuDNN discovery is Linux-only (libcudnn.so.9 checks, no Windows cudnn lookup), so the build dies at cudnn.h, and the -lcublas/-l:libcudnn.so.9 ldflags are GCC-style.
  3. Sage3/Sparge are already explicitly reported unsupported on win32 (optimized_dit.py:860,886) — the cuDNN attention backend would be the natural first Windows target.

Other run notes: Triton autotune rejects the 128–144 KB smem kernel candidates on consumer Blackwell (benign "out of resource" warnings, same as Linux); triton-windows 3.5 handled all Inductor kernels without issue.

Request-changes-level items

None. The fcntl comment wording is the only thing I'd fix before merge (a one-liner), plus please add a PR description covering the tested toolchain and the Sage3/Sparge scope limit — the diff is small but it's a platform-support claim, and the next Windows user will want to know what was validated ("win" as the commit message undersells a working native port!).

Nice work — small diff that's the missing piece for native Windows; everything else (cu130 index pin for win32, triton-windows, TE exclusion) was already in main and holds up under a real run.

Guard the evdev entry points (read_evdev_name, scan_evdev_devices) when
fcntl is unavailable (Windows) so the unreachable ioctl paths are never
entered, and reword the comment to match. Add a regression test.

Tidy the Windows renderer-plugin branch: drop the unused lib_dir, add a
/W3 /WX TODO, and note that torch _nt_quote_args keeps the /LIBPATH
intact for space-containing CUDA paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@3a1b2c3

3a1b2c3 commented Jun 11, 2026

Copy link
Copy Markdown
Author

My Claude did:
Reworded the fcntl comment and added short-circuit guards in read_evdev_name/scan_evdev_devices (closes P1), plus a regression test. Also tidied the _plugin.py NT branch: dropped the unused lib_dir, added a /W3 /WX TODO, and a comment noting torch's _nt_quote_args keeps /LIBPATH intact for space-containing CUDA paths.

@jmccaffrey-nv

Copy link
Copy Markdown
Collaborator

thanks for revision ! Approved , testing next.

@jmccaffrey-nv jmccaffrey-nv enabled auto-merge June 11, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants