-
Notifications
You must be signed in to change notification settings - Fork 1.7k
GPU hang / NV_ERR_INVALID_DEVICE / nvidia-modeset timeout on Ubuntu 24.04 HWE (Lenovo Legion Pro 5 RTX 4070 Laptop) #1092
Description
NVIDIA Open GPU Kernel Modules Version
570.211.01 (open kernel module, Ubuntu package nvidia-driver-570-open)
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 24.04 LTS (HWE)
Kernel Release
6.17.0-19-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA GeForce RTX 4070 Laptop GPU (UUID: GPU-c79a90b6-3bf5-cbd9-8b1f-6f0d233827b9)
Describe the bug
Problem
System experienced GPU hangs and full desktop freezes requiring power cycle.
Observed errors:
- NV_ERR_INVALID_DEVICE
- nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress
- nvAssert failures
These occurred during normal desktop usage (Chrome, VS Code, Slack)
To Reproduce
Steps to Reproduce
- Install NVIDIA open driver (580-open) on Ubuntu 24.04 HWE.
- Reboot into system (PRIME on-demand, X11 session).
- Connect external monitor (Dell U3421WE).
- Launch typical desktop workload:
- Google Chrome (multiple tabs)
- VS Code
- Slack
- Use system normally for development workload.
Additional Trigger Condition
- Upgrade/downgrade NVIDIA driver (580-open → 570-open) without fully cleaning previous packages.
- Reboot system.
Result
- GPU enters invalid state
- Desktop freeze occurs
- System requires power button reboot
Errors observed:
- NV_ERR_INVALID_DEVICE
- nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress
- nvAssert failures
Notes
Issue appears related to inconsistent driver state during upgrade/downgrade of open kernel modules combined with hybrid GPU (PRIME) and external display usage.
Bug Incidence
Sometimes
nvidia-bug-report.log.gz
fix-nvidia-laptop-stability.sh
More Info
Expected Behavior
System should remain stable during normal desktop usage (browser, VS Code, Slack, external monitor) without GPU hangs or requiring power cycle.
Driver upgrades/downgrades should not leave the system in a state where GPU enters invalid device state.
Observed Behavior
After driver transition (580-open → 570-open), system entered unstable state:
- GPU hang
- nvidia-modeset timeouts
- NV_ERR_INVALID_DEVICE
- full desktop freeze
Key Observation
Issue appears strongly related to inconsistent driver state during upgrade/downgrade of open kernel modules.
After fully cleaning and aligning to a single driver branch (570-open), system became stable.
Additional Insight
- Hybrid GPU (PRIME on-demand) + external monitor likely increases sensitivity to driver inconsistency
- Electron-based applications (Chrome, VS Code, Slack) may increase GPU workload but are not root cause
Suggestion
It may be useful to validate:
- driver upgrade/downgrade path integrity for open kernel modules
- detection of mixed or partially upgraded driver states
- recovery behavior when GPU enters invalid state (currently requires hard reboot)