Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
7185 commits
Select commit Hold shift + click to select a range
d45fa3e
CUDA graph support for prefix caching on hybrid models (#3922)
lmcafee-nvidia Mar 25, 2026
c586f6d
Add ability to perform local gradient accumulation in FP32 for a subs…
deepakn94 Mar 25, 2026
09cce75
Miscellaneous MXFP8 inference fixes (#4017)
santhnm2 Mar 26, 2026
a01a6c5
Use `torch.int64` for grad_num_zero accumulation (#4015)
WanZzzzzz Mar 26, 2026
548028b
Make text generation server hostname configurable (#3935)
santhnm2 Mar 26, 2026
0842ca2
Add --muon-coefficient-type argument for Muon optimizer (#3927)
mchrzanowski Mar 26, 2026
606afda
Pass gracefully if token_id not found in message (#3862)
i-riyad Mar 26, 2026
0528a40
Improve load balancing behavior for prefix cache-aware routing (#3930)
santhnm2 Mar 26, 2026
58e0b85
Refactor setup.py to use get_pybind_include (#3658)
sakgoyal Mar 27, 2026
3758b54
build: Bump TE to 2.14 (#4025)
ko3n1g Mar 27, 2026
d863b7b
chore(beep boop 🤖): Bump (main) (2026-03-30)
github-actions[bot] Mar 30, 2026
a61ce5f
fix traceback when interrupting run (#3439)
dimapihtar Mar 30, 2026
4dcd7d6
chore: update goldenvalues (#4059)
ko3n1g Mar 30, 2026
fc61ce5
Fix TemporalAsyncCaller pin_memory lifetime in async checkpointing (#…
lvdunlin Mar 30, 2026
4bde3a4
chore: Move to Py3.12 (#3826)
ko3n1g Mar 30, 2026
8256553
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Mar 31, 2026
704c7ee
Adding NVRx as a dependency and keeping the current code base optiona…
dimapihtar Mar 31, 2026
e1db321
chore: Bump versions
ko3n1g Mar 31, 2026
8dd65cd
build: Set `ENV NVTE_BUILD_NUM_PHILOX_ROUNDS=3` (#4074)
ko3n1g Mar 31, 2026
a3a7a0c
fix checkpointing conversion (#4058)
dimapihtar Mar 31, 2026
f09f5c9
chore: Bump versions
ko3n1g Mar 31, 2026
dc113cf
fix(ci): replace actions/setup-python with apt-get to avoid 429 rate …
ko3n1g Mar 31, 2026
8f3bee5
ci: Fix package name for code-freeze workflow (#4077)
ko3n1g Mar 31, 2026
1533beb
chore: bump `_code_freeze` workflow to `v0.86.0` (#4078)
ko3n1g Mar 31, 2026
ef2c8a0
Fix checkpoint inspector (#4079)
janEbert Mar 31, 2026
fd1888b
Update docs to conform to NVIDIA style guides (#4068)
megnvidia Mar 31, 2026
2b85d0a
Miscellaneous inference fixes (#4030)
santhnm2 Mar 31, 2026
15f14fc
fix fine_grained_callables with fused rmsnorm residual (#4026)
CarlosGomes98 Mar 31, 2026
97e36aa
[Main][feat] Support overlapping A2A Combine backprop with wgrad GEMM…
Wohox Mar 31, 2026
7086b61
chore: rotate oncall schedule
github-actions[bot] Apr 1, 2026
3499efe
Modify mfsdp default data-parallel-sharding-strategy (#3691)
wplf Apr 1, 2026
1284d25
Fix fsdp_dtensor conversion for pretrained-only checkpoints (#3912)
DAISY-gh Apr 1, 2026
f9a61e3
Guard NVshmem issues (#4093)
wdykas Apr 1, 2026
fe5291f
m-fsdp: wire use_precision_aware_optimizer from ddp_config to ParamAn…
rapatel Apr 1, 2026
606ac26
Megatron-FSDP: Add MXFP8 transpose helper buffer for Hybrid FSDP (#3918)
shjwudp Apr 1, 2026
8d7a3f8
feat(fsdp): use TE general_gemm for mixed-precision wgrad in FSDP pat…
Victarry Apr 1, 2026
748ac49
Megatron-FSDP: Make _pre_forward_param_unshard and _register_post_bac…
shjwudp Apr 1, 2026
3dc2251
Megatron-FSDP: Fix insufficient double buffers during gradient reduce…
shjwudp Apr 1, 2026
41f3b6f
Fix M-FSDP MXFP8 related BUGs (#3991)
shjwudp Apr 1, 2026
a52ceeb
FIX: Use decoupled gradients for precision-aware M-FSDP grad norm (#3…
XueSongTap Apr 1, 2026
150e37a
[Megatron-FSDP] Fix compatibility with frozen parameters and add unit…
shjwudp Apr 2, 2026
cb3bb41
Align chat completions endpoint with vLLM (#4063)
santhnm2 Apr 2, 2026
159e347
[M-FSDP] Refactor uneven dtensor to full tensor and add UT (#3190)
shjwudp Apr 2, 2026
5a7f520
Add agent instruction files (#4102)
Phlip79 Apr 2, 2026
17a67b9
Bump eopt version (#4100)
skyw Apr 2, 2026
8b8ceb5
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 3, 2026
5b512b4
Refactor emerging optimizer integration (#4113)
skyw Apr 2, 2026
dcc6d62
Fix over provisioning of Mamba state memory when max_requests is set …
santhnm2 Apr 3, 2026
2697b82
base strategy simplification (#4001)
dimapihtar Apr 3, 2026
69f3b34
add support for DCP and FSDP async save (#4027)
dimapihtar Apr 3, 2026
c9797ad
Add more emerging optimizers (#3907) (#4119)
skyw Apr 3, 2026
76e4daa
Fix FSDP checkpoint conversion and loading for Qwen3.5-VL (#3936)
DAISY-gh Apr 3, 2026
a025a69
docs: update mcore optimizer docstrings to google style (#2799)
Akshat8510 Apr 3, 2026
07db9f7
Update oncall schedule (#4117)
Phlip79 Apr 3, 2026
a72c027
Set tensor-parallel attributes irrespective of perform_initialization…
ilml Apr 3, 2026
c0e3134
docs: add developer-guide skill with CI/CD and failure navigation gui…
ko3n1g Apr 3, 2026
8758d16
chore: Move skills (#4136)
ko3n1g Apr 3, 2026
fd76254
ci: Let Claude react to comment (#4135)
ko3n1g Apr 3, 2026
d865bba
Nemotron3 Super GB200 release config (#4118)
maanug-nv Apr 3, 2026
3d87bfc
Enable CUDA graph for ADAM optimizer (#3429)
vasunvidia Apr 3, 2026
499266a
Claude review should recommend testing (#4137)
Phlip79 Apr 3, 2026
10e7b74
cleanup: remove unused `scatter_gather_tensors_in_pipeline` argument …
Phlip79 Apr 4, 2026
0b8306b
fix: Remove fail-fast (-x) and guard distributed teardown against dea…
ko3n1g Apr 5, 2026
1d43284
chore(beep boop 🤖): Bump (main) (2026-04-06)
github-actions[bot] Apr 6, 2026
7d536c0
Claude: add respond-to-issue skill (#4141)
Phlip79 Apr 6, 2026
6652d57
Fix muon getter backward compatability (#4157)
skyw Apr 6, 2026
97cd326
Audit of user guide (#4098)
megnvidia Apr 6, 2026
0b5e3ae
Fix `RerunStateMachine` crash (`TypeError: 'NoneType' object is not s…
yezhengmao1 Apr 6, 2026
fa5103c
Preserve type of decorated methods/classes (#4062)
nschank Apr 6, 2026
eba2eaf
update muon test case to use new interface (#4163)
skyw Apr 7, 2026
8cbc45b
[M-FSDP] Fix Tensor Parallel mode detection (#3191)
shjwudp Apr 7, 2026
8e8aff6
fix: remove weights_only=False for multimodal example (#4104)
faradawn Apr 7, 2026
52150ab
Cudagraphs: Fix sequence packing segfault more generally (#4162)
mathemakitten Apr 7, 2026
70a7f69
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 8, 2026
c146305
Make MTP work with materialize_only_last_token_logits (#4166)
santhnm2 Apr 7, 2026
8cf6b35
Add unit test for Mamba EP inference (eager fallback with mixed CUDA …
santhnm2 Apr 8, 2026
2368a2e
chore: rotate oncall schedule
github-actions[bot] Apr 8, 2026
cc025f8
update docs in respect to async changes (#4177)
dimapihtar Apr 8, 2026
1341d8c
update checkpointing docs in respect to async changes (#4208)
dimapihtar Apr 8, 2026
e40feed
chore: improve build-and-test skill with trigger rules and dependency…
ko3n1g Apr 8, 2026
51bcf14
Fix layerwise optimizer with `expt_dp_size=1` and contention with ele…
skyw Apr 8, 2026
50bafa0
ci: add --cluster-a100/h100/gb200 args to trigger_internal_ci.py (#4195)
ko3n1g Apr 8, 2026
41595d0
ci: Update golden values for nightly tests (#4215)
chtruong814 Apr 8, 2026
a614df4
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 9, 2026
1b425ab
rename async_allgather to overlap_param_gather (#4217)
skyw Apr 9, 2026
8f72c54
Fix Slack sync for users with GitHub email privacy enabled (#4220)
Phlip79 Apr 9, 2026
980211a
Miscellaneous MTP inference fixes (#4191)
santhnm2 Apr 9, 2026
be09bb6
Move inference guards out of arguments.py (#4210)
mathemakitten Apr 9, 2026
92d5c1f
Fix: enable fine-grained activation offloading for Mamba model. (#4173)
fanshiqing Apr 9, 2026
b5354a8
bump NVRx (#4178)
dimapihtar Apr 9, 2026
09312c8
Update tokenizer args for Nemotron3 release config (#4239)
maanug-nv Apr 9, 2026
2f5c62c
build: add dynamic git-versioning and drop rc0 pre-release tag (#4212)
ko3n1g Apr 9, 2026
567d4d4
Fix unnecessary permute padding for non-quantized MoE dispatch (#4038)
xiaoxi-wangfj Apr 10, 2026
51b0950
Fix split state dict main (#3676)
kunlunl Apr 10, 2026
22e0bb5
Enable FP8 DPA for MXFP8 recipe (#4066)
vasunvidia Apr 10, 2026
3d4de97
Add /split-pr Claude Code command for splitting PRs by CODEOWNERS (#4…
Phlip79 Apr 10, 2026
2ebfbb2
Enable AG/RS overlap with explicit process group passing (#3249)
jeffnvidia Apr 10, 2026
d30c3ae
Enable cpu_offloading with Full iteration CUDA graph (#3969)
vasunvidia Apr 10, 2026
e8e79a4
Fix TransformerConfig validation for mixed dense/MoE upcycling (#3647)
rkteddy Apr 10, 2026
0602523
Remove cross-rank synchronization during checkpoint load & deprecate …
asolergi-nv Apr 10, 2026
ab43d43
Fix incorrectly set decoupled_grad and DistOpt mechanics for MFSDP. (…
cspades Apr 10, 2026
45a49eb
Refit Miscelaneous (#3973)
wdykas Apr 10, 2026
1daa19f
Add conditions_embeddings argument to TransformerBlock, TransformerLa…
huvunvidia Apr 10, 2026
705d8ed
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 11, 2026
59fc894
Fix build_sequences_per_dataset output path arg usage (#4144)
DhineshPonnarasan Apr 11, 2026
7cbc68c
ci: Flush pending CUDA work before the barrier in destroy_model_paral…
chtruong814 Apr 11, 2026
e3d1204
Update oncall schedule (#4257)
Phlip79 Apr 12, 2026
41fcaa4
docs(moe): Update MoE README (#3664)
sbhavani Apr 13, 2026
cc4cb01
Revert "Add conditions_embeddings argument to TransformerBlock, Trans…
ko3n1g Apr 13, 2026
6da6267
reduce the number of shared expert streams (#3752)
yangbofun Apr 13, 2026
7f8f37e
remove legacy Bert code (#4204)
dimapihtar Apr 13, 2026
20ba03f
[Main] Feat(moe): Gated delta net context parallel (CP) (#2642)
yuzhongw-nvidia Apr 13, 2026
81b8b5a
remove t5 legacy code (#4203)
dimapihtar Apr 13, 2026
dda6901
fix: handle list-typed process groups in ProcessGroupCollection.__rep…
cluster2600 Apr 13, 2026
10f5fbd
Fix Context Parallelism documentation link (#4149)
liangxs Apr 13, 2026
5dcda19
[MLA] fix: Pad V when Q/V head dims differ for THD (#3003)
HollowMan6 Apr 13, 2026
d85365b
Allow the evaluation batch size to differ from the training batch siz…
michal2409 Apr 13, 2026
25129bf
fix(megatron-fsdp): build expt_device_mesh only for MoE models (#3831)
xuwchen Apr 13, 2026
a1595b8
Add @NVIDIA/transformer review group to megatron/core/transformer/ (#…
Phlip79 Apr 13, 2026
5f80f0a
Reset AG_pipeline bucket status after validation step. (#3155)
vasunvidia Apr 13, 2026
e32e323
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 14, 2026
df929f5
Enhance and fix NVTX for training (#3642)
yaox12 Apr 14, 2026
e1db4a0
NVFP4 native weights for DDP (#4005)
WanZzzzzz Apr 14, 2026
1ca250c
Remove unnecessary arguments for layerwise distributed optimizer (#4272)
FDecaYed Apr 14, 2026
28e13c4
reuse grad buffer for layer-wise param allgather (#3751)
FDecaYed Apr 14, 2026
e9e513a
feat(ci): add strict review mode to Claude review workflow (#4197)
Victarry Apr 14, 2026
cd03c3e
Fix stale approvals (#4280)
Phlip79 Apr 14, 2026
eb80b74
[MoE] Add a new score function to the router (#3673)
yaox12 Apr 14, 2026
ebfa138
[MoE] Improvement of shared expert overlap, support shared expert ove…
Victarry Apr 14, 2026
123645b
build: bump DeepEP to 34152ae (#4228)
ko3n1g Apr 14, 2026
3d7a701
ci: mark test_fused_indexer_loss_gradient_tp_consistency as flaky_in_…
ko3n1g Apr 14, 2026
4cef23c
Fix typo in PR4133. (#4277)
cspades Apr 14, 2026
bcec618
ci: add retry loop to apt-get update to handle transient mirror sync …
ko3n1g Apr 14, 2026
4e85d74
fix: enforce correct pass thresholds for deterministic and approximat…
ko3n1g Apr 14, 2026
d245c44
remove legacy biencoder and realm models (#4205)
dimapihtar Apr 14, 2026
6636eb0
ci: add configurable launcher support for functional tests (ft_launch…
ko3n1g Apr 14, 2026
d530a04
chore: document --target main for local Docker builds (#4307)
ko3n1g Apr 14, 2026
97aca2f
Extract args init to launch scripts (#4225)
maanug-nv Apr 14, 2026
c2d1a8f
[Main] Fix TE version check for retain_pinned_cpu_buffers in cpu offl…
BestJuly Apr 15, 2026
b342602
chore: rotate oncall schedule
github-actions[bot] Apr 15, 2026
1d344ae
Fix documented shape (#3486)
janEbert Apr 15, 2026
4a79536
ci: add sync-skills workflow, rename CLAUDE.md → AGENTS.md, move .cla…
ko3n1g Apr 15, 2026
57fc3ae
chore(beep boop 🤖): symlink skills/ → .claude/skills, .agents/skills …
github-actions[bot] Apr 16, 2026
23265d2
Get `device` correctly when module returns a dict instead of individu…
shifangx Apr 16, 2026
e69cf43
remove vision legacy code (#4202)
dimapihtar Apr 16, 2026
f098fe8
feat: long convergence resiliency for release tests (#4335)
ko3n1g Apr 16, 2026
8681ebb
ci(action): improve GitHub Actions output UX (#4337)
ko3n1g Apr 16, 2026
ceac269
build: bump TransformerEngine to release_v2.14 (#4331)
ko3n1g Apr 16, 2026
efbe7a1
feat: add create-issue skill (#4338)
ko3n1g Apr 16, 2026
97f9ab6
Set megatron-fsdp to 0.5.0
ko3n1g Apr 16, 2026
01eb7e8
M4 leftover for TE cuda graph (#3137)
shifangx Apr 16, 2026
260cba7
fix: wait for async P2P send before deallocating output tensor (#4047)
ZhiyuLi-Nvidia Apr 16, 2026
2aeaf56
ci(gb200): add 1-node mr-github functional test variants (#4334)
ko3n1g Apr 17, 2026
ded22f4
Fix potential coredump issue that occurs when saving a checkpoint (#1…
ezioliao Apr 17, 2026
30bc230
docs: bump versions1.json to 0.17.0 (latest) (#4360)
ko3n1g Apr 17, 2026
a00e944
Port DeepSeek Sparse Attention to `MambaModel` (#3553)
janEbert Apr 17, 2026
23663a8
Add tables and histogram for RL staleness (#4097)
tdene Apr 17, 2026
4ece77d
[docs] ci: use parent-relative json_url for version picker (#4367)
ko3n1g Apr 17, 2026
ed5de26
Fix bug with non-partial rollouts (#3964)
tdene Apr 17, 2026
e15ec3c
Add QK layernorm support for dot-product attention in MambaModel (#4067)
Phlip79 Apr 17, 2026
75a2878
Docs: improve docstrings and comments in example training loop (#4041)
DhineshPonnarasan Apr 17, 2026
86b7218
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 18, 2026
9978968
feat(ckpt): add --async-ckpt-use-cpu-shm argument (#4355)
sbak5 Apr 18, 2026
664baa8
cp: Fix UT timeout (#4310) (#4373)
chtruong814 Apr 18, 2026
e4d3a4c
Fix RL reward due to stop token (#4096)
tdene Apr 18, 2026
76ac7c2
FA4 Inference (#4186)
wdykas Apr 18, 2026
3315c86
Make param_index_map always use unpacked (full numel) offsets (#4328)
deepakn94 Apr 18, 2026
8be1e79
Add activation logging and tokens per expert logging (#3842)
Mellonta Apr 18, 2026
98a51eb
Fix RL to once again work with --skip-train (#4249)
tdene Apr 18, 2026
afae25b
Fix Megatron initialization with extra_args_provider (#4327)
santhnm2 Apr 18, 2026
15e07a2
Rename MambaModel/MambaStack to HybridModel/HybridStack (#4099)
Phlip79 Apr 19, 2026
3046182
chore(beep boop 🤖): Bump (main) (2026-04-20)
github-actions[bot] Apr 20, 2026
9c210f7
fix(ci): wrap uv install in retry block (#4387)
ko3n1g Apr 20, 2026
7928a84
Call save_checkpoint_and_time() when saving checkpoint and compute el…
awsankur Apr 20, 2026
ef1888b
refactor(tests): move NCCL env vars from docker launcher to shell tra…
ko3n1g Apr 20, 2026
b562151
Remove packed_attention_mask unused parameter (#3859)
tdene Apr 20, 2026
859b66a
Second batch of audit edits (#4115)
megnvidia Apr 20, 2026
c9e03d0
Replace rampup batch size scheduler with custom step batch size sched…
mkhona-nvidia Apr 20, 2026
dc87858
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 21, 2026
a52112d
revert: replace rampup batch size scheduler with custom step batch si…
ko3n1g Apr 21, 2026
0b9bc20
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 21, 2026
532ad92
Replace rampup batch size scheduler with custom step batch size sched…
deepakn94 Apr 21, 2026
e5ec9ab
Megatron-FSDP: log mcore detection only after imports succeed (#4400)
wujingyue Apr 21, 2026
a550e0e
ci(gb200): re-enable tunable_overlap 1-node mr-github test (#4405)
ko3n1g Apr 21, 2026
77e2dd4
Fix local docs building (#4416)
Phlip79 Apr 21, 2026
e778967
RL: Onload optimizer after logprobs computation (#4235)
tdene Apr 22, 2026
bbc6b4d
chore: rotate oncall schedule
github-actions[bot] Apr 22, 2026
7597a0d
Add RL token throughput and packing metrics (#3877)
tdene Apr 22, 2026
9834c99
ci: remove publish:merge_into_dev job (#4421)
ko3n1g Apr 22, 2026
a6bfe1a
docs: add data loading best practices for large-scale training (#4236)
sbhavani Apr 22, 2026
384e618
Fix: Auto enable manual registration and enhance the docummentation (…
youngeunkwon0405 Apr 22, 2026
9a3c927
Fix nvtx_decorator to check _nvtx_enabled at call time (#4184)
minitu Apr 22, 2026
60f71e1
fix merges_file typo in megatron_hf_tokenizer (#4392)
chelseajohn Apr 22, 2026
c9dfe34
Enable NullTokenizer for pretraining to reduce I/O access (#4057)
asolergi-nv Apr 22, 2026
7073492
docs: Add SECURITY.md (#4431)
chtruong814 Apr 22, 2026
40627d0
Mamba inference opt (#4414)
wdykas Apr 22, 2026
55b8111
DDP refactoring: Extract parameter layout computation into optimizer …
deepakn94 Apr 22, 2026
90e09b6
Update PR template with explicit request for issue (#4409)
Phlip79 Apr 22, 2026
ab2b33d
Misc inference fixes (#4397)
sidsingh-nvidia Apr 23, 2026
60408d5
Rename Mamba to Hybrid outside megatron/core (#4159)
Phlip79 Apr 23, 2026
a52014c
Include mtp layers in token per expert logging (#4412)
Mellonta Apr 23, 2026
32275b2
fix: NVRx async compatibility and defer resiliency import (#4420)
sbak5 Apr 23, 2026
9bb35a8
ci: add base_sha to codecov/codecov-action upload step (#4445)
ko3n1g Apr 23, 2026
3034d86
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 24, 2026
f78ed05
fix(checkpoint_inspector): allow empty --param-to-param-group-map-jso…
DAISY-gh Apr 24, 2026
4d6cdd5
Add the YARN support for hybrid_model (#4244)
guihong-nv Apr 24, 2026
41ffa83
[training migration] Add container class for config dataclasses (#4227)
maanug-nv Apr 24, 2026
a1165fa
Inference: Fix broken functional tests on gitlab (#4454)
sidsingh-nvidia Apr 24, 2026
d4cacef
SafeUnpickler class for safe pickle usage (#4319)
dimapihtar Apr 24, 2026
109feda
get rid of weights_only=False (#4434)
dimapihtar Apr 24, 2026
64870c1
Inference | Per-block MoE routing storage for prefix caching (#4301)
lmcafee-nvidia Apr 24, 2026
017e684
Add troubleshooting tip for 'access forbidden' (#4449)
balasaajay Apr 24, 2026
3d7bcd3
Fix checkpoint loading with rerun state machine (#4448)
YangFei1990 Apr 24, 2026
9b02206
Add misc CUDA graph sugar to CudaGraphManager (#4425)
tdene Apr 24, 2026
35f76df
Inference: Add the embedding and output layer in the full_iteration_i…
sidsingh-nvidia Apr 24, 2026
481efd0
Important bugfixes in local CG implementation that were leading to lo…
jiemingz Apr 24, 2026
e9abb6c
fix: Replace polynomial rolling hash with SHA-256 for prefix caching …
lmcafee-nvidia Apr 24, 2026
377af02
feat(ckpt): expose validate_access_integrity knob on dist-ckpt load (…
asolergi-nv Apr 24, 2026
241a5ca
Fix multivalidation (#3388)
RPrenger Apr 25, 2026
f2dcd42
Add missing knob for reduce_scatter_with_fp32_accumulation (#4410)
WanZzzzzz Apr 25, 2026
03f4111
Enable CUDA graphs for MTP inference (#4260)
santhnm2 Apr 26, 2026
1879dc2
chore(beep boop 🤖): Bump (main) (2026-04-27)
github-actions[bot] Apr 27, 2026
970c254
checkpoint integrity verification (#4305)
dimapihtar Apr 27, 2026
ebd70d3
Fix cache gating (#4455)
wdykas Apr 27, 2026
0447347
[Main] Fix FusedAdam.use_decoupled_grad mis-set for Megatron-FSDP. (#…
cspades Apr 27, 2026
8c5cf05
add permute fusion into hybrid ep (#4089)
Autumn1998 Apr 28, 2026
42e396e
Add ColocatedBridgeCommunicator for heterogeneous TP/DP MIMO training…
yashaswikarnati Apr 28, 2026
6fd6652
Fix incorrect bias display in extra_repr of Column/RowParallelLinear …
HelloWorldBeginner Apr 28, 2026
c8a4bfd
Fix assertion logic in combined_1f1b_schedule_for_interleaved_pipelin…
joapolarbear Apr 28, 2026
374fa85
ci: Fix event name reference in CI workflow condition for merge group…
balasaajay Apr 28, 2026
9c15290
Add manual sync workflow from main to dev (#4165)
Phlip79 Apr 28, 2026
9816140
fix: handle list-format quant_cfg from ModelOpt PR #1094 (#4187)
ChenhanYu Apr 28, 2026
9e98259
ci: also add Run MBridge tests label in nightly sync workflow (#4499)
Phlip79 Apr 28, 2026
533dc75
Update copy-pr-bot.yaml [skip ci]
github-actions[bot] Apr 29, 2026
1c4e537
[training migration] Add serialization features to config container (…
maanug-nv Apr 29, 2026
f4a49cf
Fix conflict with inference graphs (#4504)
tdene Apr 29, 2026
251c6e9
chore: rotate oncall schedule
github-actions[bot] Apr 29, 2026
c5201a0
Add tools/prepare_cache.py for offline GPT dataset cache preparation …
asolergi-nv Apr 29, 2026
cb3d5d9
[build] fix: move mamba-ssm and causal-conv1d to optional [ssm] extra…
ko3n1g Apr 29, 2026
4e208a8
mamba: avoid redundant HBM reloads in causal_conv1d_update shift loop…
wdykas Apr 29, 2026
3f59bbb
Standardize misc graph interface (#4485)
tdene Apr 29, 2026
29864b2
Fix inference graph override in RL flow (#4323)
tdene Apr 29, 2026
b23aa3f
Unify and refactor Megatron-FSDP documentation. (#4418)
cspades Apr 29, 2026
51ea07e
Revert "ci: add base_sha to codecov/codecov-action upload step (#4445…
chtruong814 Apr 29, 2026
cfee04e
Skills for running unit tests and working with slurm (#4502)
yashaswikarnati Apr 29, 2026
0d98cb8
Reorganize order of operations in inference context and text generati…
tdene Apr 29, 2026
0c52c39
ci: Update CI workflow conditions to include merge group handling (#4…
balasaajay Apr 30, 2026
6ba794b
ci: add base_sha to codecov/codecov-action upload step (#4540)
chtruong814 Apr 30, 2026
580d53a
Fix release tests: remove --global-batch-size conflicting with --step…
deepakn94 Apr 30, 2026
77afc60
docs: use @file-path notation for file references in skills (#4542)
ko3n1g Apr 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .agents/skills
1 change: 1 addition & 0 deletions .claude/skills
39 changes: 39 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"

# Only comment on Critical/Major bugs. No Minor, Trivial, or style comments.
tone_instructions: "Only comment on Critical or Major bugs. Never comment on Minor issues, style, refactoring, or suggestions. When in doubt, stay silent."

reviews:
# Use chill profile - filters out nitpicks automatically
profile: "chill"

# Disable all summary features
high_level_summary: false
high_level_summary_in_walkthrough: false

# Disable walkthrough comment entirely
collapse_walkthrough: true
changed_files_summary: false
sequence_diagrams: false

# Disable status/effort estimates
review_status: false
commit_status: false
estimate_code_review_effort: false

# Disable auto-suggestions for labels/reviewers
suggested_labels: false
suggested_reviewers: false

# Disable related issues/PRs lookup
assess_linked_issues: false
related_issues: false
related_prs: false

# Auto-review disabled - only review when explicitly requested via @coderabbitai review
auto_review:
enabled: false

chat:
auto_reply: true
1 change: 1 addition & 0 deletions .cursorrules
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See CLAUDE.md for all repository guidelines.
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 100
extend-ignore = E203,E501,F401,E402,E714
per-file-ignores = __init__.py:F401
67 changes: 67 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba

megatron/core/models/hybrid/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-model

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/tokenizers/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/tokenizers

megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/distrib_optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/quantization-and-inference

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/transformer

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

megatron/training/ @NVIDIA/training-adlr @NVIDIA/training-nemo
megatron/training/arguments.py

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.github/oncall_schedule.json @NVIDIA/mcore-oncall-rotation
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci
scripts/README_API_COMPAT.md @NVIDIA/ci
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci
docs/api-backwards-compatibility-check.md @NVIDIA/ci
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Bug report
about: Create a report to help us improve the repository or project
title: ""
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.


**Additional context**

Add any other context about the problem here.
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
blank_issues_enabled: false

23 changes: 23 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
name: Feature request
about: Suggest an idea for this project
title: ""
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
13 changes: 13 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
name: QUESTION
about: Ask a question about Megatron-LM that is not a bug, regression or enhancement
request
title: "[QUESTION]"
labels: ''
assignees: ''

---

**Your question**
Ask a clear and concise question about Megatron-LM. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.
40 changes: 40 additions & 0 deletions .github/ISSUE_TEMPLATE/regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
name: REGRESSION
about: Report a regression in speed or accuracy due to a Megatron-LM update
title: "[REGRESSION]"
labels: ''
assignees: ''

---

**Describe the regression**
A clear and concise description of what the regression is. Tag the [@mcore-oncall](https://github.com/orgs/NVIDIA/teams/mcore-oncall)
to get oncall's attention to this issue.

**To Reproduce**
Steps to reproduce the behavior. The easier it is to reproduce the faster it will get maintainer attention.

**Previous performance**
What speed or accuracy did you previously see.

**New performance**
What speed or accuracy do you see after the update.

**Stack trace/logs**
If applicable, add the stack trace or logs related to the regression.

**Environment (please complete the following information):**
- Previous Megatron-LM commit ID
- New Megatron-LM commit ID
- Previous PyTorch version
- New PyTorch version
- Previous CUDA version
- New CUDA version
- Previous NCCL version
- New NCCL version

**Proposed fix**
If you have a proposal for how to fix the issue state it here or link to a PR.

**Additional context**
Add any other context about the problem here.
Loading