AMD/ROCm - Fix VAE_KL_MEM_RATIO overestimation for modern ROCm#12685
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis pull request changes the VAE memory ratio constant for AMD GPU devices in the VAE initialization path, reducing 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/sd.py`:
- Around line 442-445: Add a brief inline comment explaining the rationale for
the VAE_KL_MEM_RATIO override when running on AMD (model_management.is_amd()) by
documenting that 1.3 is a conservative 30% safety margin for modern ROCm (e.g.,
ROCm 7.x) and reference the tracking issue or PR (for example "see issue `#2`");
place this comment immediately adjacent to the VAE_KL_MEM_RATIO = 1.3 assignment
so future maintainers understand why it differs from the original 2.73/1.0
value.
|
I remember the 2.73 value came from last October (v0.3.65), about 3 weeks after ROCm 6.4.4 w/Pytorch for Windows came out. It was in the same release as cudnn/MIOpen being disabled for AMD, but not related according to comfyanonymous. I tried out a 1.3 value with SDXL, but surprisingly didn't see a difference in peak VRAM usage during VAE decode. It didn't break anything either, though. |
VAE_KL_MEM_RATIO is set to 2.73 for AMD/ROCm in comfy/sd.py. This value was introduced for older ROCm versions where memory overhead was significantly higher. On modern ROCm (7.x), this massively overestimates VRAM requirements for VAE operations, causing ComfyUI to unnecessarily offload models from VRAM before VAE encoding/decoding.
Impact: On GPUs with limited VRAM (8-16GB), this overestimation may cause frequent unnecessary model offloading, significantly impacting performance. On larger GPUs (32GB) the impact is less noticeable but still causes suboptimal memory management.
Tested on: AMD Radeon AI PRO R9700 (32GB VRAM, gfx1201), ROCm 7.2, Windows and Linux
Fix: A value of 1.0 worked correctly with no OOM errors. 1.3 is suggested as a conservative value to maintain a safety margin for older hardware or ROCm versions.
Change: comfy/sd.py: VAE_KL_MEM_RATIO = 2.73 → 1.3 for AMD
Related: WAN 2.2 i2v: Second run is 4 5to 5 times slower on AMD GPU (ROCm) #12672