Fix NPU Qwen3.5 grpo bugs#9589
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces compatibility layers and weight synchronization enhancements for vLLM-Ascend, specifically supporting Qwen3.5 MoE models and handling list-style RoPE validation keys from older configurations. It also patches PEFT to keep regex target modules intact during Transformers-v5 MoE conversion. The review feedback highlights two important improvements: first, safely retrieving the RoPE validation conversion method using getattr to prevent AttributeError on older transformers versions; second, ensuring that PEFT config conversion falls back to the original converter when regex compilation fails or when targeting 'all-linear' modules, while also pre-compiling the regex outside of nested loops to optimize performance.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| origin_convert = PretrainedConfig.convert_rope_params_to_dict | ||
|
|
||
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation=None, **kwargs): | ||
| if isinstance(ignore_keys_at_rope_validation, list): | ||
| ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation) | ||
| return origin_convert(self, ignore_keys_at_rope_validation=ignore_keys_at_rope_validation, **kwargs) | ||
|
|
||
| PretrainedConfig.convert_rope_params_to_dict = convert_rope_params_to_dict | ||
| try: | ||
| yield | ||
| finally: | ||
| PretrainedConfig.convert_rope_params_to_dict = origin_convert |
There was a problem hiding this comment.
The convert_rope_params_to_dict method was introduced in newer versions of transformers. If an older version of transformers is installed, accessing PretrainedConfig.convert_rope_params_to_dict directly will raise an AttributeError on import or execution. To prevent this, use getattr to safely retrieve the method and yield early if it is not present.
| origin_convert = PretrainedConfig.convert_rope_params_to_dict | |
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation=None, **kwargs): | |
| if isinstance(ignore_keys_at_rope_validation, list): | |
| ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation) | |
| return origin_convert(self, ignore_keys_at_rope_validation=ignore_keys_at_rope_validation, **kwargs) | |
| PretrainedConfig.convert_rope_params_to_dict = convert_rope_params_to_dict | |
| try: | |
| yield | |
| finally: | |
| PretrainedConfig.convert_rope_params_to_dict = origin_convert | |
| origin_convert = getattr(PretrainedConfig, 'convert_rope_params_to_dict', None) | |
| if origin_convert is None: | |
| yield | |
| return | |
| def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation=None, **kwargs): | |
| if isinstance(ignore_keys_at_rope_validation, list): | |
| ignore_keys_at_rope_validation = set(ignore_keys_at_rope_validation) | |
| return origin_convert(self, ignore_keys_at_rope_validation=ignore_keys_at_rope_validation, **kwargs) | |
| PretrainedConfig.convert_rope_params_to_dict = convert_rope_params_to_dict | |
| try: | |
| yield | |
| finally: | |
| PretrainedConfig.convert_rope_params_to_dict = origin_convert |
| try: | ||
| re.compile(target_modules) | ||
| except re.error: | ||
| return | ||
|
|
||
| if target_modules == 'all-linear': | ||
| return | ||
|
|
||
| target_parameters = _names_to_set(getattr(peft_config, 'target_parameters', None)) | ||
| original_target_parameters = target_parameters.copy() | ||
| old_names_by_new_name = {} | ||
| for old_name, new_name in target_module_mapping.items(): | ||
| old_names_by_new_name.setdefault(new_name, set()).add(old_name) | ||
|
|
||
| matched_fused_targets = {new_name: set() for new_name in fused_targets} | ||
| for param_name, _ in model.named_parameters(): | ||
| for new_name, old_names in old_names_by_new_name.items(): | ||
| if not (param_name == new_name or param_name.endswith(f'.{new_name}')): | ||
| continue | ||
| prefix = param_name[:-len(new_name)] | ||
| matched_old_names = { | ||
| old_name | ||
| for old_name in old_names if re.fullmatch(target_modules, f'{prefix}{old_name}') | ||
| } |
There was a problem hiding this comment.
There are two issues in this block:
- Correctness/Safety: If
re.compilefails or iftarget_modules == 'all-linear', returningNonedirectly will completely bypassorigin_convert, which prevents PEFT from performing its default config conversion. We should delegate these cases back toorigin_convert. - Performance:
re.fullmatch(target_modules, ...)is called inside nested loops over all model parameters and target modules. This compiles the regex stringtarget_modulesrepeatedly, which is highly inefficient. Compiling the regex once outside the loops usingre.compileand reusing the compiled pattern avoids this overhead.
try:
pattern = re.compile(target_modules)
except re.error:
return origin_convert(peft_config, model, conversions)
if target_modules == 'all-linear':
return origin_convert(peft_config, model, conversions)
target_parameters = _names_to_set(getattr(peft_config, 'target_parameters', None))
original_target_parameters = target_parameters.copy()
old_names_by_new_name = {}
for old_name, new_name in target_module_mapping.items():
old_names_by_new_name.setdefault(new_name, set()).add(old_name)
matched_fused_targets = {new_name: set() for new_name in fused_targets}
for param_name, _ in model.named_parameters():
for new_name, old_names in old_names_by_new_name.items():
if not (param_name == new_name or param_name.endswith(f'.{new_name}')):
continue
prefix = param_name[:-len(new_name)]
matched_old_names = {
old_name
for old_name in old_names if pattern.fullmatch(f'{prefix}{old_name}')
}There was a problem hiding this comment.
Pull request overview
This PR improves NPU (vLLM-Ascend) compatibility for running GRPO on Qwen3.5 / Qwen3.5-MoE across both colocate and server rollout modes, and documents a known PEFT + Transformers 5 MoE fused-expert LoRA limitation.
Changes:
- Add NPU-aware broadcast and post-load processing hooks so server-mode weight sync correctly rebuilds vLLM-Ascend MoE kernel layouts after reload.
- Extend vLLM-Ascend MoE weight-loader patching to support selecting “processed” vs “preprocessed” expert layouts for different sync paths (Megatron vs FSDP2 reload-style).
- Add a temporary Transformers 5 RoPE compatibility monkey-patch for vLLM 0.18.x Qwen3.5 configs and document PEFT fused-expert LoRA limitations.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| swift/rlhf_trainers/vllm_client.py | Centralizes broadcast behavior to be NPU-stream/device-context aware during server sync. |
| swift/rlhf_trainers/utils.py | Adds Qwen3.5 MoE registry entry and threads MoE layout selection into Ascend expert loader patching. |
| swift/rlhf_trainers/rollout_mixin.py | Configures vLLM-Ascend MoE sync layout for colocate mode and refines fused-expert name/weight handling. |
| swift/pipelines/infer/rollout.py | Adds NPU-aware broadcast helper and ensures full-reload paths trigger Ascend MoE post-load processing. |
| swift/model/npu_patch/vllm_ascend.py | Re-exports the Ascend MoE preprocessed-layout selector to support sync logic. |
| swift/model/npu_patch/vllm_ascend_moe.py | Implements layout-mode tracking and enhanced Ascend MoE expert weight-loader behavior for sync/reload. |
| swift/infer_engine/vllm_engine.py | Adds a scoped patch to accept list-based RoPE validation ignore keys for vLLM 0.18.x + Transformers 5. |
| docs/source/BestPractices/NPU-support.md | Documents PEFT Transformers 5 MoE fused-expert LoRA limitations (ZH). |
| docs/source_en/BestPractices/NPU-support.md | Documents PEFT Transformers 5 MoE fused-expert LoRA limitations (EN). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| weight = torch.empty(shape, dtype=dtype, device=self.communicator.device) | ||
|
|
||
| # Use NCCL to broadcast the updated weights from the client (src) to all workers. | ||
| self.communicator.broadcast( | ||
| weight, src=self.client_rank, stream=getattr(get_torch_device(), 'current_stream', lambda: None)()) | ||
| _broadcast_tensor(self.communicator, weight, src=self.client_rank) | ||
| synchronize() | ||
| self.communicator.group.barrier() | ||
|
|
|
cc @hjh0119 |
| def should_keep_fused_moe_expert_for_vllm_ascend(model) -> bool: | ||
| """Return whether fused expert names should be kept for vLLM-Ascend sync.""" | ||
| return False |
There was a problem hiding this comment.
What is the purpose of this method?
There was a problem hiding this comment.
Thanks for pointing this out. This helper was a leftover from an earlier branch where we tried both "keep fused expert names" and "expand fused expert names" paths. After the final fix, vLLM-Ascend always needs the expanded checkpoint-style names for this sync path, so the helper had degenerated into a constant False, which was unnecessary and confusing.
I removed the helper and now call the fused MoE expansion path directly. This keeps the intent explicit and avoids carrying a dead switch in the code.
| def _broadcast_tensor(communicator, tensor: torch.Tensor, src: int) -> None: | ||
| if is_torch_npu_available(): | ||
| device_module = get_torch_device() | ||
| with device_module.device(communicator.device): | ||
| communicator.broadcast(tensor, src=src, stream=device_module.current_stream()) | ||
| else: | ||
| communicator.broadcast(tensor, src=src, stream=getattr(get_torch_device(), 'current_stream', lambda: None)()) |
There was a problem hiding this comment.
Consider reusing the methods from swift/pipelines/infer/rollout.py or extracting them into utils
There was a problem hiding this comment.
Done. I extracted the shared tensor broadcast logic into broadcast_tensor_for_vllm_weight_sync() in swift/rlhf_trainers/utils.py, and both swift/rlhf_trainers/vllm_client.py and swift/pipelines/infer/rollout.py now reuse it.
| """ | ||
| from transformers import PretrainedConfig | ||
|
|
||
| origin_convert = PretrainedConfig.convert_rope_params_to_dict |
There was a problem hiding this comment.
Agree with Gemini. Is this attribute safe to get for lower versions of Transformers
There was a problem hiding this comment.
Fixed. The patch now uses getattr(PretrainedConfig, "convert_rope_params_to_dict", None) and skips the monkey patch if the method does not exist.
|
Sorry for the late review. I've left a few minor comments |
|
Thanks! Is this pr ready to merge? |
Yes, I think it’s ready. Please feel free to merge it if there are no further concerns. |
PR type
PR information
This PR enables Qwen3.5 / Qwen3.5-MoE GRPO on NPU, covering both colocate and server vLLM rollout paths, and documents the known limitation of MoE fused expert LoRA under PEFT + Transformers 5.
The changes are grouped into four areas:
This PR only contains framework code changes and does not include local smoke scripts.
1. Qwen3.5 RoPE config compatibility with vLLM 0.18 + Transformers 5
Files involved:
swift/infer_engine/vllm_engine.pyProblem
Qwen3.5 depends on the config / RoPE parameter handling logic in Transformers 5. In the Qwen3.5 config path of vLLM 0.18.x,
ignore_keys_at_rope_validationmay still be passed in as alist; however, the Transformers 5 side treats it as asetand performs set operations on it, which leads to a type error.Fix
When creating the vLLM engine, this PR temporarily patches
PretrainedConfig.convert_rope_params_to_dict()so that it only convertsignore_keys_at_rope_validationfromlisttosetwhen needed.Upstream status
2. MoE weight synchronization layout compatibility for vLLM-Ascend
Files involved:
swift/model/npu_patch/vllm_ascend.pyswift/model/npu_patch/vllm_ascend_moe.pyswift/rlhf_trainers/rollout_mixin.pyswift/rlhf_trainers/utils.pyProblem
During GRPO training, training-side weights need to be synchronized to the vLLM rollout side. For Qwen MoE on vLLM-Ascend, MoE expert weights have backend-specific layouts:
gate_up_proj/down_projtensors;w13_weight/w2_weightcan leave the runtime weight layout in the wrong direction.One observed failure was the first rollout hitting
npu_grouped_matmulwith a hidden-size mismatch, where the activation K dimension washidden_sizebut the syncedw13_weightdimension still matched the TP-sharded expert intermediate dimension.Fix
qwen3_moeandqwen3_5_moeas the Qwen MoE sync family.gate_proj/up_proj/down_projnames before calling the vLLM loader.w13_weight:[local_experts, hidden, 2 * intermediate_per_tp]w2_weight:[local_experts, intermediate_per_tp, hidden]process_weights_after_loading()only for this FSDP2 Qwen MoE colocate runtime-sync path, because calling post-load again would transpose the just-synced runtime weights back to the wrong direction.Upstream status
model.load_weightssilently corrupts MoE forward on second call vllm-project/vllm#42821w13_weight/w2_weight:3. Weight synchronization compatibility for vLLM server mode
Files involved:
swift/pipelines/infer/rollout.pyswift/rlhf_trainers/vllm_client.pyProblem
In server mode, the training process synchronizes weights to an independent vLLM rollout server through a client. For the Qwen3.5-MoE + vLLM-Ascend path, weight synchronization needs to ensure that:
Fix
_broadcast_tensor().communicator.devicedevice context and use the current NPU stream only under NPU runtime./process_weights_after_loading/call on the server worker side to rebuild the vLLM-Ascend MoE layout after full reload.4. Known limitation: PEFT 0.19 + Transformers 5 fused MoE LoRA
This PR does not attempt to fix the PEFT fused expert LoRA path in SWIFT. Instead, it documents the limitation in the NPU support documentation.
Background
Transformers 5 changes the structure of some MoE experts. Previously, each expert typically exposed independent
nn.Linearmodules such asgate_proj,up_proj, anddown_proj; in the new structure, these expert weights may instead be fused into a singlenn.Parameter, for example:Regular LoRA uses PEFT
target_modulesto matchnn.Modules, while fused MoE experts require PEFTtarget_parametersto match parameter names directly.Current assessment
target_parametersis an upstream mechanism provided by PEFT, but there are still unresolved edge cases in combinations involvinglora_dropout, ZeRO-3 / FSDP, and multi-adapter setups.swift/tuners/peft.py.docs/source/BestPractices/NPU-support.mddocs/source_en/BestPractices/NPU-support.mdRecommendation
model_type, so as to avoid triggering the PEFT Transformers 5 MoE conversion path.lora_dropout=0and the target backend has been separately validated.Related upstream context
target_parametersto support expert weights represented asnn.Parameter, such as in MoE:https://github.com/huggingface/peft/releases/tag/v0.17.0
target_parametersissues, but still retained limitations such as multi-adapter support:FIX Multiple issues with target_parameters huggingface/peft#2710
target_parameters/ adapters:DPOTrainer ref adapter crashes with PEFT
target_parameters(MoE models on Transformers 5.x) huggingface/trl#5222ParamWrapperunder ZeRO-3 / FSDP:ParamWrapper initialization fails when DeepSpeed ZeRO-3 is enabled huggingface/peft#2916
FSDP + torch.nn.Parameter (MoE layer) lora fine-tuning doesn't work huggingface/peft#3080
Experiment results
Verified:
Not covered:
target_parameters + lora_dropout > 0is still intentionally unsupported; the current approach is to document the limitation rather than maintain a temporary workaround in SWIFT.