-
Notifications
You must be signed in to change notification settings - Fork 76
feat: add wan2.2_t2v model and quantization config #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
70d3676
feat: add wan2.2_t2v model and quantization config
Charles2530 84f89f9
feat: wan2.2-t2v quantization configs and model updates
Charles2530 715104f
Wan2.2: MoE calibration split, blockwise input, OOM fixes and config
Charles2530 02b4133
update wan2.2
Charles2530 6e2dddb
feat: add HF state_dict print tool
Charles2530 57df671
debug by claude
Charles2530 44bb8a8
feat: skip AWQ transform and fake-quant for first blocks of both tran…
Charles2530 1a998a0
wan2.2: use official Wan2.2 backend for A14B import and native save
Charles2530 df9a09b
chore: update skip-first quant task notes and run script
Charles2530 f261203
fix(wan2.2): enforce native save structure and align default dual gui…
Charles2530 007360e
fix(wan): preserve catcher kwargs forwarding during calibration
Charles2530 5aee498
Delete CLAUDE.md
Charles2530 5a51ded
docs/wan2.2 + refactor(wan2.2): move native save helpers and update q…
Charles2530 e0fc7d4
chore: tidy wan2.2 docs and quant exports
Charles2530 0cdfa67
refactor(wan2.2): move Wan2.2 save logic; update wan_t2v configs
Charles2530 3349931
chore: update wan i2v/t2v configs and eval defaults
Charles2530 fb0e364
chore(wan2.2): add awq_w_a placeholders
Charles2530 366478b
Delete tools/print_state_dict_hf.py
Charles2530 e3e1243
Merge branch 'main' into feat/wan2.2-t2v
Charles2530 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,5 +22,8 @@ save* | |
| .log | ||
| *.pid | ||
| *.ipynb* | ||
| model/ | ||
| output_* | ||
| datasets/ | ||
| .venv/ | ||
| *.sh | ||
| *.sh | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| base: | ||
| seed: &seed 42 | ||
| model: | ||
| type: Wan2T2V | ||
| path: /path/to/wan_t2v | ||
| torch_dtype: auto | ||
| use_cpu_to_save_cuda_mem_for_catcher: True | ||
| calib: | ||
| name: t2v | ||
| download: False | ||
| path: ./assets/wan_t2v/calib/ | ||
| sample_steps: 20 | ||
| bs: 1 | ||
| target_height: 480 | ||
| target_width: 832 | ||
| num_frames: 81 | ||
| guidance_scale: 4.0 | ||
| guidance_scale_2: 3.0 | ||
| seed: *seed | ||
| eval: | ||
| eval_pos: [] | ||
| type: video_gen | ||
| name: t2v | ||
| download: False | ||
| path: ./assets/wan_t2v/calib/ | ||
| bs: 1 | ||
| target_height: 480 | ||
| target_width: 832 | ||
| num_frames: 81 | ||
| guidance_scale: 4.0 | ||
| guidance_scale_2: 3.0 | ||
| output_video_path: ./output_videos_awq/ | ||
| quant: | ||
| video_gen: | ||
| method: Awq | ||
| weight: | ||
| quant_type: int-quant | ||
| bit: 4 | ||
| symmetric: True | ||
| granularity: per_channel | ||
| group_size: -1 | ||
| act: | ||
| quant_type: int-quant | ||
| bit: 4 | ||
| symmetric: True | ||
| granularity: per_token | ||
| special: | ||
| trans: True | ||
| trans_version: v2 | ||
| weight_clip: True | ||
| clip_sym: True | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/x2v/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,4 +46,4 @@ quant: | |
| clip_sym: True | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/x2v/ | ||
| save_path: /path/to/x2v/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,4 +46,4 @@ quant: | |
| clip_sym: True | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/x2v/ | ||
| save_path: /path/to/x2v/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,4 +29,4 @@ quant: | |
| granularity: per_token | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/x2v/ | ||
| save_path: /path/to/x2v/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,4 +42,4 @@ quant: | |
| alpha: 0.7 | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/x2v/ | ||
| save_path: /path/to/x2v/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,288 @@ | ||
| # Wan2.1 视频生成模型量化指南 | ||
|
|
||
| ## 概述 | ||
|
|
||
| llmc 框架现已全面支持 Wan2.1 系列视频生成模型的量化,并提供真正量化的 INT8/FP8 权重导出,与 lightx2v 推理框架兼容。 | ||
|
|
||
| ## 支持的模型类型 | ||
|
|
||
| - **WanI2V**: Image-to-Video (图像到视频) | ||
| - **WanT2V**: Text-to-Video (文本到视频) | ||
|
|
||
| ## 支持的量化方法 | ||
|
|
||
| ### FP8 量化 (推荐) | ||
|
|
||
| **配置文件**: `configs/quantization/video_gen/wan_i2v/smoothquant_w_a_fp8.yaml` | ||
|
|
||
| **特点**: | ||
| - 使用 E4M3 FP8 格式 (8-bit 浮点数,4位指数,3位尾数) | ||
| - SmoothQuant 算法,平衡权重和激活的量化难度 | ||
| - 适合 GPU 推理,性能损失小 | ||
|
|
||
| **量化配置**: | ||
| ```yaml | ||
| quant: | ||
| video_gen: | ||
| method: SmoothQuant | ||
| weight: | ||
| quant_type: float-quant | ||
| bit: e4m3 # FP8 E4M3 格式 | ||
| symmetric: True | ||
| granularity: per_channel | ||
| use_qtorch: True | ||
| act: | ||
| quant_type: float-quant | ||
| bit: e4m3 # FP8 E4M3 格式 | ||
| symmetric: True | ||
| granularity: per_token | ||
| use_qtorch: True | ||
| special: | ||
| alpha: 0.75 # SmoothQuant 平衡参数 | ||
| ``` | ||
|
|
||
| ### INT8 量化 | ||
|
|
||
| #### 1. RTN (Round-to-Nearest) | ||
| **配置文件**: `configs/quantization/video_gen/wan_i2v/rtn_w_a.yaml` | ||
|
|
||
| **特点**: | ||
| - 最简单的量化方法 | ||
| - 直接四舍五入到最近的量化级别 | ||
| - 速度快,精度略低 | ||
|
|
||
| #### 2. AWQ (Activation-aware Weight Quantization) | ||
| **配置文件**: `configs/quantization/video_gen/wan_i2v/awq_w_a.yaml` | ||
|
|
||
| **特点**: | ||
| - 基于激活分布优化权重量化 | ||
| - 保护重要通道,减少精度损失 | ||
| - 需要校准数据 | ||
|
|
||
| #### 3. SmoothQuant | ||
| **配置文件**: `configs/quantization/video_gen/wan_i2v/smoothquant_w_a.yaml` | ||
|
|
||
| **特点**: | ||
| - 平衡权重和激活的量化难度 | ||
| - 数学上等价于平滑激活异常值 | ||
| - 通常提供最佳精度 | ||
|
|
||
| ### LoRA 模型量化 | ||
|
|
||
| 支持对 LoRA 适配器模型的量化: | ||
| - `smoothquant_w_a_int8_lora.yaml` | ||
| - `rtn_w_a_lora.yaml` | ||
|
|
||
| ## 运行步骤 | ||
|
|
||
| ### 1. 准备环境 | ||
|
|
||
| ```bash | ||
| # 设置 llmc 路径 | ||
| export llmc=/path/to/llmc | ||
| export PYTHONPATH=$llmc:$PYTHONPATH | ||
|
|
||
| # 设置 GPU | ||
| export CUDA_VISIBLE_DEVICES=0 | ||
| ``` | ||
|
|
||
| ### 2. 准备校准数据 | ||
|
|
||
| 为 I2V 模型准备校准数据: | ||
| ``` | ||
| assets/wan_i2v/calib/ | ||
| ├── image_1.jpg | ||
| ├── image_2.jpg | ||
| └── ... | ||
| ``` | ||
|
|
||
| 为 T2V 模型准备校准数据: | ||
| ``` | ||
| assets/wan_t2v/calib/ | ||
| ├── prompt_1.txt | ||
| ├── prompt_2.txt | ||
| └── ... | ||
| ``` | ||
|
|
||
| ### 3. 修改配置文件 | ||
|
|
||
| 编辑对应的 YAML 配置文件,设置: | ||
| - `model.path`: Wan2.1 模型路径 | ||
| - `calib.path`: 校准数据路径 | ||
| - `save.save_path`: 量化模型保存路径 | ||
|
|
||
| **示例 (FP8 量化)**: | ||
| ```yaml | ||
| base: | ||
| seed: 42 | ||
| model: | ||
| type: WanI2V | ||
| path: /path/to/wan2.1-i2v-model # 修改为你的模型路径 | ||
| torch_dtype: auto | ||
| calib: | ||
| name: i2v | ||
| download: False | ||
| path: /path/to/calibration/data # 修改为校准数据路径 | ||
| sample_steps: 40 | ||
| bs: 1 | ||
| target_height: 480 | ||
| target_width: 832 | ||
| num_frames: 81 | ||
| guidance_scale: 5.0 | ||
| save: | ||
| save_lightx2v: True | ||
| save_path: /path/to/save/quantized/model # 修改为保存路径 | ||
| ``` | ||
|
|
||
| ### 4. 运行量化 | ||
|
|
||
| #### 使用脚本运行 (推荐) | ||
|
|
||
| ```bash | ||
| # 运行 FP8 量化 (I2V) | ||
| ./run_llmc.sh wan_i2v_fp8 | ||
|
|
||
| # 运行 INT8 RTN 量化 (I2V) | ||
| ./run_llmc.sh wan_i2v_int8_rtn | ||
|
|
||
| # 运行 INT8 AWQ 量化 (I2V) | ||
| ./run_llmc.sh wan_i2v_int8_awq | ||
|
|
||
| # 运行 INT8 SmoothQuant 量化 (I2V) | ||
| ./run_llmc.sh wan_i2v_int8_smoothquant | ||
|
|
||
| # 运行 T2V 模型量化 | ||
| ./run_llmc.sh wan_t2v_int8_rtn | ||
| ./run_llmc.sh wan_t2v_int8_awq | ||
| ./run_llmc.sh wan_t2v_int8_smoothquant | ||
| ``` | ||
|
|
||
| #### 直接运行命令 | ||
|
|
||
| ```bash | ||
| torchrun \ | ||
| --nnodes 1 \ | ||
| --nproc_per_node 1 \ | ||
| --rdzv_id $RANDOM \ | ||
| --rdzv_backend c10d \ | ||
| --rdzv_endpoint 127.0.0.1:29500 \ | ||
| ${llmc}/llmc/__main__.py \ | ||
| --config configs/quantization/video_gen/wan_i2v/smoothquant_w_a_fp8.yaml \ | ||
| --task_id my_quant_task | ||
| ``` | ||
|
|
||
| ### 5. 监控进度 | ||
|
|
||
| ```bash | ||
| # 查看日志 | ||
| tail -f wan_i2v_fp8.log | ||
|
|
||
| # 查看进程 | ||
| ps aux | grep __main__.py | ||
| ``` | ||
|
|
||
| ### 6. 停止任务 | ||
|
|
||
| ```bash | ||
| # 使用保存的 PID 文件 | ||
| xargs kill -9 < wan_i2v_fp8.pid | ||
| ``` | ||
|
|
||
| ## 配置参数说明 | ||
|
|
||
| ### 模型配置 | ||
| - `type`: 模型类型 (`WanI2V` 或 `WanT2V`) | ||
| - `path`: 模型权重路径 | ||
| - `torch_dtype`: 数据类型 (`auto`, `bfloat16`, `float32`) | ||
|
|
||
| ### 校准配置 | ||
| - `sample_steps`: 采样步数 (通常 20-40) | ||
| - `bs`: 批大小 (通常 1,视频生成显存占用大) | ||
| - `target_height`: 目标视频高度 (默认 480) | ||
| - `target_width`: 目标视频宽度 (默认 832) | ||
| - `num_frames`: 视频帧数 (默认 81) | ||
| - `guidance_scale`: CFG 引导强度 (默认 5.0) | ||
|
|
||
| ### 量化配置 | ||
| - `method`: 量化方法 (`RTN`, `Awq`, `SmoothQuant`) | ||
| - `weight.bit`: 权重位宽 (8, e4m3) | ||
| - `act.bit`: 激活位宽 (8, e4m3) | ||
| - `granularity`: 量化粒度 (`per_channel`, `per_token`) | ||
| - `special.alpha`: SmoothQuant 平衡参数 (0.5-1.0) | ||
|
|
||
| ## 在 lightx2v 中使用量化模型 | ||
|
|
||
| ### 1. 配置 lightx2v | ||
|
|
||
| 编辑 `lightx2v/configs/quantization/wan_i2v.json`: | ||
| ```json | ||
| { | ||
| "infer_steps": 40, | ||
| "target_video_length": 81, | ||
| "target_height": 480, | ||
| "target_width": 832, | ||
| "dit_quantized_ckpt": "/path/to/quantized/model", | ||
| "dit_quantized": true, | ||
| "dit_quant_scheme": "int8-vllm" | ||
| } | ||
| ``` | ||
|
|
||
| 对于 FP8 模型,设置 `"dit_quant_scheme": "fp8"`。 | ||
|
|
||
| ### 2. 运行推理 | ||
|
|
||
| ```bash | ||
| python -m lightx2v.infer \ | ||
| --model_cls wan2.1 \ | ||
| --task i2v \ | ||
| --model_path /path/to/original/model \ | ||
| --config_json configs/quantization/wan_i2v.json \ | ||
| --prompt "Your prompt here" \ | ||
| --image_path /path/to/input/image.jpg \ | ||
| --save_result_path output.mp4 | ||
| ``` | ||
|
|
||
| ## 性能建议 | ||
|
|
||
| 1. **FP8 vs INT8**: | ||
| - FP8: 精度更高,适合对质量要求高的场景 | ||
| - INT8: 压缩率更高,适合对速度要求高的场景 | ||
|
|
||
| 2. **量化方法选择**: | ||
| - 快速原型: RTN | ||
| - 平衡精度和速度: SmoothQuant | ||
| - 最高精度: AWQ | ||
|
|
||
| 3. **校准数据**: | ||
| - 使用 10-50 个样本 | ||
| - 覆盖典型使用场景 | ||
| - I2V: 使用多样化图像 | ||
| - T2V: 使用多样化文本描述 | ||
|
|
||
| 4. **资源需求**: | ||
| - GPU: 建议 24GB+ 显存 | ||
| - 校准时间: 30分钟 - 2小时 (取决于数据量) | ||
| - 存储空间: 量化后模型约原模型 25-50% 大小 | ||
|
|
||
| ## 故障排除 | ||
|
|
||
| ### 显存不足 | ||
| - 减小 `bs` 到 1 | ||
| - 减小 `num_frames` | ||
| - 减小 `target_height` 和 `target_width` | ||
|
|
||
| ### 量化精度损失过大 | ||
| - 尝试 SmoothQuant 方法 | ||
| - 增加校准数据数量 | ||
| - 调整 `alpha` 参数 (0.5-1.0) | ||
|
|
||
| ### lightx2v 兼容性问题 | ||
| - 确保使用 `save_lightx2v: True` | ||
| - 检查 `dit_quant_scheme` 设置 | ||
| - 确认量化模型路径正确 | ||
|
|
||
| ## 参考 | ||
|
|
||
| - lightx2v 文档: [lightx2v 项目地址] | ||
| - llmc 框架: [llmc 项目地址] | ||
| - Wan2.1 模型: [模型地址] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.