Skip to content

[Main] Nightly build mega-issue #625

@casteryh

Description

@casteryh

Updated by @JenniferWang on Dec 22, 2025

Context

The typical release cycles does not meet our need to co-development with the dependency stacks (monarch, torchstore, torchtitan) -- we need the latest fixes, features and we cannot afford to wait/coordinate with all downstream releases, e.g. #644. On the flip side, forge's upstream may want to pull the latest code using the forge-nightly build -- this is effectively the same as installing the latest forge source code.

The Plan

A "true" nightly build means using the nightly build of all the dependent libraries: torch-nightly, torchtitan-nightly, monarch-nightly, torchstore-nightly, vllm-nightly...

However, to start with, our current pyproject uses pinned torch version torch==2.9 instead of using torch>=2.9 that's incompatible with torch-nightly. But why? torchforge, a pure python library, indeed does not have direct dependency on torch. The pinned torch version 2.9 is for the custom vLLM build workflow -- we have to use a fixed vLLM build because there's no guarantee that the vLLM nightly build (against torch nightly) API is compatible with forge main branch. The upgrade is non-trivial.

To make things more complicated, existing dependency's nightly can be broken as well: e.g. titan nightly branch is divergent from the main branch https://github.com/pytorch/torchtitan/tree/nightly.

Therefore, the conclusion is we cannot offer a "true" forge-nightly build that also uses the nightly build of all major dependent libraries.

However, in order for people to move fast, instead of pinning the stable versions, we'll pin the specific nightly version of these libraries: monarch, torchtitan, torchstore that we have manually tested.

We will set up an integration test that runs continuously against the latest nightly build from the dependencies (not the pinned version) to capture backward incompatible changes. This also serves as a test plan when we need to update the pinned nightly version to pick up new downstream changes.

When we do a stable release, we'll change the nightly pinned versions to use stable version whenever possible. This is probably low ROI for now given how rapidly things are moving.

To-dos

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions