Skip to content

Fix DeepCompile+Z3 on PyTorch v2.9/2.10#7951

Open
tohtana wants to merge 5 commits intodeepspeedai:masterfrom
tohtana:tohtana/fix-deepcompile-z3-pt29
Open

Fix DeepCompile+Z3 on PyTorch v2.9/2.10#7951
tohtana wants to merge 5 commits intodeepspeedai:masterfrom
tohtana:tohtana/fix-deepcompile-z3-pt29

Conversation

@tohtana
Copy link
Copy Markdown
Collaborator

@tohtana tohtana commented Apr 3, 2026

DeepCompile+Z3 didn't work with PyTorch v2.9/2.10 because:

  • PyTorch v2.9+ started enforcing stricter TorchDynamo parameter tensor-match guards. During DeepCompile tracing, some ZeRO-3 parameters were temporarily all-gathered, so Dynamo recorded full sizes such as 4096
  • By the time guard evaluation ran, DeepSpeed had already released those params back to the normal ZeRO-3 partitioned representation, where param.data is empty(0). That produced guard failures like expected 4096, actual 0.

This PR resolves the issue by:

  • Leep full-shape dummy tensors for symbolic tracing
  • Override guard size/stride metadata for ZeRO-3 params to the stable released representation instead of transient gathered sizes

This PR also includes fixes of these bugs:

  • For v2.7 and v2.8, the compiled backward graph could hoist end_backward ahead of the real reduce_grad calls. - Selective unsharding pass can overcount the persistence memory budget.

Note: DeepCompile is still incompatible with v2.11. It will be addressed by another PR.

tohtana added 4 commits April 2, 2026 13:47
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
@tohtana tohtana requested review from loadams and tjruwase as code owners April 3, 2026 01:10
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56479c3151

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

(cherry picked from commit c7f1b5cd3f84bf5cdc37a48515eaff5f06580fb4)
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant