Add AutoEP by tohtana · Pull Request #7938 · deepspeedai/DeepSpeed

tohtana · 2026-03-31T00:11:17Z

This PR adds AutoEP (Automatic Expert Parallelism) to DeepSpeed training for HuggingFace MoE models.

AutoEP detects MoE blocks during deepspeed.initialize(), builds the required EP/EDP process groups, and replaces supported MoE blocks with an EP-enabled execution path, so expert parallelism can be enabled with DeepSpeed config only and without model code changes.

Current scope in this PR is the base AutoEP feature:

ZeRO stages 0, 1, and 2 support
checkpoint save/load support
universal checkpoint conversion support

ZeRO-3 extensions are intentionally left as follow-up work (#7928 should be merged for this work)

Supported presets in this PR:

Mixtral
Qwen3-MoE
DeepSeek-V2
DeepSeek-V3
LLaMA-4

For end-to-end benchmarking and testing, an AutoEP example is available in DeepSpeedExamples:

https://github.com/tohtana/DeepSpeedExamples/tree/tohtana/add_auto_ep/training/expert_parallel

Attribution

This implementation substantially builds on TorchTitan's MoE / expert-parallel implementation, and we want to explicitly acknowledge that prior work.

The TorchTitan-derived pieces in this PR are primarily:

deepspeed/moe/ep_router.py: adapted from TorchTitan's TokenChoiceTopKRouter
deepspeed/moe/ep_experts.py: adapted from TorchTitan's GroupedExperts and grouped-GEMM expert execution path
deepspeed/moe/ep_kernels.py: adapted from TorchTitan's TokenReorderer, generate_permute_indices, Triton fill-indices kernel, and token-group alignment / padding helpers
deepspeed/module_inject/auto_ep_layer.py: adapts the same router -> reorder -> dispatch -> local expert compute -> combine structure used in TorchTitan's MoE / EP flow

Relevant TorchTitan sources:

The DeepSpeed-specific work in this PR is the AutoEP integration layer around those building blocks:

HuggingFace MoE detection and structural validation
model-family presets and custom-config path
weight repacking from HF expert layouts into grouped expert tensors
DeepSpeed runtime group setup and module replacement
DeepSpeed checkpoint save/load and universal checkpoint support
DeepSpeed docs and tests

Design

The implementation is split into a few layers:

deepspeed/module_inject/auto_ep_config.py
- user config parsing
- built-in model presets
- validation for EP topology and per-model constraints
deepspeed/module_inject/auto_ep.py
- scans the model for MoE blocks
- validates the detected structure
- builds a MoELayerSpec for each supported MoE layer
- replaces the original HF block with AutoEPMoELayer
deepspeed/module_inject/auto_ep_layer.py
- the drop-in execution wrapper for a detected MoE block
- implements router execution, token reorder, EP dispatch/combine, local expert compute, and shared-expert merge
deepspeed/moe/ep_router.py, deepspeed/moe/ep_experts.py, deepspeed/moe/ep_kernels.py
- reusable MoE runtime pieces for routing, grouped expert compute, token permutation, and aligned grouped-GEMM execution
deepspeed/moe/ep_repack.py
- converts HF expert weights into the grouped expert layout expected by the runtime
deepspeed/runtime/engine.py and checkpoint conversion code
- wires AutoEP into deepspeed.initialize()
- handles checkpoint save/load metadata and universal checkpoint integration

At runtime, the execution path is:

detect and replace supported HF MoE blocks during initialization
route tokens with the EP router
reorder tokens by expert assignment
perform all-to-all dispatch across the EP group when autoep_size > 1
run local grouped expert compute
all-to-all combine and restore the original token order
merge shared experts if the model has them

Adding new model support

There are two supported ways to extend AutoEP to a new MoE model family.

Add a preset in PRESET_MODELS.
This is the preferred path for a model family we want to support out of the box. A preset defines:

MoE layer pattern
router child name
experts child name
expert weight names / layout
num_experts and top_k config attributes
routing defaults
optional shared-expert structure

Use the custom config path.
For models that are not yet built into DeepSpeed, AutoEP can be driven from config with:

moe_layer_pattern
router_pattern
expert_pattern
expert_w1, expert_w2, expert_w3
num_experts_attr
top_k_attr
optional shared-expert fields

Once detection can produce a valid MoELayerSpec, the replacement, execution, and checkpoint paths are shared.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana · 2026-03-31T00:36:37Z

This feature is still experimental. The next steps are:

ZeRO3 integration ([Feature] Enable AutoEP Compatibility with ZeRO-3 #7928)
Add gpt-oss to preset (Requested by @jiosephlee)

We welcome help testing and validating this on large-scale models.

jiosephlee · 2026-03-31T05:16:16Z

@tohtana I wish I could be of help, but I haven't written code on this level; if you could clarify on what you mean by a preset for gpt-oss, or if there are other first-issues kind of work I could help with, I would gladly look into it

tohtana · 2026-03-31T05:55:26Z

Hi @jiosephlee,
Thank you for offering help! It would be great if you can try gpt-oss once it is implemented with this AutoEP work.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana added 7 commits February 6, 2026 20:59

add autoep

bea50ef

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

add checkpointing

2c041db

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

fix format

c2a89bc

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

add custom patterns

fd07c93

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

fix optimizer resumption

cabfebc

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

autoep: fix post-dispatch local expert permutation grouping

a2ab10d

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Merge branch 'master' into tohtana/add_autoep

e79c2e8

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana mentioned this pull request Mar 31, 2026

[Feature] Enable AutoEP Compatibility with ZeRO-3 #7928

Open

PKUWZP self-requested a review March 31, 2026 01:17

tohtana added 4 commits March 31, 2026 18:33

fix(autoep): restore ep_count helper

046db04

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

test(autoep): make checkpoint tests cpu-safe

71a0a36

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Fix AutoEP ZeRO-2 expert gradient scaling

fae0276

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

fix(autoep): preserve manual backward parity

5f7dc1e

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AutoEP#7938

Add AutoEP#7938
tohtana wants to merge 11 commits intodeepspeedai:masterfrom
tohtana:tohtana/add_autoep

tohtana commented Mar 31, 2026 •

edited

Loading

Uh oh!

tohtana commented Mar 31, 2026 •

edited

Loading

Uh oh!

jiosephlee commented Mar 31, 2026 •

edited

Loading

Uh oh!

tohtana commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tohtana commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Attribution

Design

Adding new model support

Uh oh!

tohtana commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiosephlee commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tohtana commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tohtana commented Mar 31, 2026 •

edited

Loading

tohtana commented Mar 31, 2026 •

edited

Loading

jiosephlee commented Mar 31, 2026 •

edited

Loading