Skip to content

Commit 41be8da

Browse files
committed
Merge pull request #3000 from AI-Hypercomputer:chengnuojin-pr-3000
PiperOrigin-RevId: 863399027
2 parents 3bc185f + 5840964 commit 41be8da

18 files changed

Lines changed: 1409 additions & 1493 deletions

.github/workflows/run_jupyter_notebooks.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ jobs:
8787
- name: Run Post-Training Notebooks
8888
shell: bash
8989
env:
90+
PYTHONPATH: "${{ github.workspace }}/src"
9091
HF_TOKEN: ${{ secrets.HF_TOKEN }}
9192
run: |
9293
MAXTEXT_REPO_ROOT=$(pwd)

docs/guides/run_python_notebook.md

Lines changed: 39 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Before starting, make sure you have:
1919
- ✅ Basic familiarity with Jupyter, Python, and Git
2020

2121
**For Method 2 (Visual Studio Code) and Method 3 (Local Jupyter Lab) only:**
22+
2223
- ✅ A Google Cloud Platform (GCP) account with billing enabled
2324
- ✅ TPU quota available in your region (check under IAM & Admin → Quotas)
2425
-`tpu.nodes.create` permission to create a TPU VM
@@ -36,16 +37,18 @@ Currently, this method only supports the **`sft_qwen3_demo.ipynb`** notebook, wh
3637

3738
Before proceeding, please verify that the specific notebook you are running works reliably on the free-tier TPU resources. If you encounter frequent disconnections or resource limitations, you may need to:
3839

39-
* Upgrade to a Colab Pro or Pro+ subscription for more stable and powerful TPU access.
40+
- Upgrade to a Colab Pro or Pro+ subscription for more stable and powerful TPU access.
4041

41-
* Move to local Jupyter Lab setup method with access to a powerful TPU machine.
42+
- Move to local Jupyter Lab setup method with access to a powerful TPU machine.
4243

4344
### Step 1: Choose an Example
45+
4446
1.a. Visit the [MaxText examples directory](https://github.com/AI-Hypercomputer/maxtext/tree/main/src/MaxText/examples) on Github.
4547

4648
1.b. Find the notebook you want to run (e.g., `sft_qwen3_demo.ipynb`) and copy its URL.
4749

4850
### Step 2: Import into Colab
51+
4952
2.a. Go to [Google Colab](https://colab.research.google.com/) and sign in.
5053

5154
2.b. Select **File** -> **Open Notebook**.
@@ -63,9 +66,11 @@ Before proceeding, please verify that the specific notebook you are running work
6366
3.c. Click **Save**
6467

6568
### Step 4: Run the Notebook
69+
6670
Follow the instructions within the notebook cells to install dependencies and run the training/inference.
6771

6872
## Method 2: Visual Studio Code with TPU (Recommended)
73+
6974
Running Jupyter notebooks in Visual Studio Code (VS Code) provides a powerful, interactive environment that combines the flexibility of notebooks with the robust features of a code editor. Follow these steps to get your environment up and running.
7075

7176
### Step 1: Set Up TPU VM
@@ -75,9 +80,10 @@ In Google Cloud Console, create a standalone TPU VM:
7580
1.a. **Compute Engine****TPUs****Create TPU**
7681

7782
1.b. Example config:
78-
- **Name:** `maxtext-tpu-node`
79-
- **TPU type:** Choose your desired TPU type
80-
- **Runtime Version:** `tpu-ubuntu2204-base` (or other compatible runtime)
83+
84+
- **Name:** `maxtext-tpu-node`
85+
- **TPU type:** Choose your desired TPU type
86+
- **Runtime Version:** `tpu-ubuntu2204-base` (or other compatible runtime)
8187

8288
### Step 2: SSH to TPU-VM via VS Code
8389

@@ -86,11 +92,12 @@ In Google Cloud Console, create a standalone TPU VM:
8692
2.b. Follow [Connect to a remote host](https://code.visualstudio.com/docs/remote/ssh#_connect-to-a-remote-host) guide to connect to your TPU-VM via VS Code.
8793

8894
### Step 3. Install Necessary Extensions on VS Code
95+
8996
To enable notebook support, you must install two official extensions from the VS Code Marketplace:
9097

91-
* Python Extension: Provides support for the Python language.
98+
- Python Extension: Provides support for the Python language.
9299

93-
* Jupyter Extension: Enables you to create, edit, and run `.ipynb` files directly inside VS Code.
100+
- Jupyter Extension: Enables you to create, edit, and run `.ipynb` files directly inside VS Code.
94101

95102
To install, click the `Extensions` icon on the left sidebar (or press `Ctrl+Shift+X` or `Cmd+Shift+X`), search for `Jupyter` and `Python`, and click `Install`.
96103

@@ -99,6 +106,7 @@ To install, click the `Extensions` icon on the left sidebar (or press `Ctrl+Shif
99106
To execute post-training notebooks on your TPU-VM, follow the official [MaxText installation guides](https://maxtext.readthedocs.io/en/latest/tutorials/posttraining/rl.html#create-virtual-environment-and-install-maxtext-dependencies) to install MaxText and its dependencies inside a dedicated virtual environment.
100107

101108
### Step 5: Install the necessary library for Jupyter
109+
102110
Jupyter requires a kernel to execute code. This kernel is tied to a specific Python environment. Open your terminal inside VS Code and run:
103111

104112
```bash
@@ -110,9 +118,9 @@ uv pip install ipykernel
110118
Before you can run the notebook, you must tell VS Code which Python environment to use.
111119

112120
1. Look at the top-right corner of the notebook editor.
113-
2. Click `Select Kernel`.
114-
3. Choose Python Environments and select the virtual environment you created in Step 4.
115-
4. Open [available post-training notebooks in MaxText](#available-examples) inside VS Code and run the jupyter notebook cells.
121+
1. Click `Select Kernel`.
122+
1. Choose Python Environments and select the virtual environment you created in Step 4.
123+
1. Open [available post-training notebooks in MaxText](#available-examples) inside VS Code and run the jupyter notebook cells.
116124

117125
## Method 3: Local Jupyter Lab with TPU (Recommended)
118126

@@ -125,12 +133,15 @@ In Google Cloud Console, create a standalone TPU VM:
125133
1.a. **Compute Engine****TPUs****Create TPU**
126134

127135
1.b. Example config:
128-
- **Name:** `maxtext-tpu-node`
129-
- **TPU type:** Choose your desired TPU type
130-
- **Runtime Version:** `tpu-ubuntu2204-base` (or other compatible runtime)
136+
137+
- **Name:** `maxtext-tpu-node`
138+
- **TPU type:** Choose your desired TPU type
139+
- **Runtime Version:** `tpu-ubuntu2204-base` (or other compatible runtime)
131140

132141
### Step 2: Connect with Port Forwarding
142+
133143
Run the following command on your local machine:
144+
134145
> **Note**: The `--` separator before the `-L` flag is required. This tunnels the remote port 8888 to your local machine securely.
135146
136147
```bash
@@ -170,13 +181,15 @@ jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
170181
```
171182

172183
### Step 7: Access the Notebook
184+
173185
7.a. Look at the terminal output for a URL that looks like: `http://127.0.0.1:8888/lab?token=...`.
174186

175187
7.b. Copy that URL.
176188

177189
7.c. Paste it into your **local computer's browser**.
178-
* **Important:** If you changed the port in Step 2 (e.g., to `9999`), you must manually replace `8888` in the URL with `9999`.
179-
* *Example:* `http://127.0.0.1:9999/lab?token=...`
190+
191+
- **Important:** If you changed the port in Step 2 (e.g., to `9999`), you must manually replace `8888` in the URL with `9999`.
192+
- *Example:* `http://127.0.0.1:9999/lab?token=...`
180193

181194
7.d. Once the interface opens in your browser, Click on the current kernel name (e.g., `Python 3 (ipykernel)`).
182195

@@ -197,13 +210,13 @@ jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
197210

198211
## Common Pitfalls & Debugging
199212

200-
| Issue | Solution |
201-
|-------|----------|
202-
| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image |
203-
| ❌ Colab disconnects | Save checkpoints to GCS or Drive regularly |
204-
| ❌ "RESOURCE_EXHAUSTED" errors | Use smaller batch size or v5e-8 instead of v5e-1 |
205-
| ❌ Firewall blocked | Ensure port 8888 open, or always use SSH tunneling |
206-
| ❌ Path confusion | In Colab use `/content/maxtext`; in TPU VM use `~/maxtext` |
213+
| Issue | Solution |
214+
| ------------------------------ | ---------------------------------------------------------- |
215+
| ❌ TPU runtime mismatch | Check TPU runtime version matches VM image |
216+
| ❌ Colab disconnects | Save checkpoints to GCS or Drive regularly |
217+
| ❌ "RESOURCE_EXHAUSTED" errors | Use smaller batch size or v5e-8 instead of v5e-1 |
218+
| ❌ Firewall blocked | Ensure port 8888 open, or always use SSH tunneling |
219+
| ❌ Path confusion | In Colab use `/content/maxtext`; in TPU VM use `~/maxtext` |
207220

208221
## Support and Resources
209222

@@ -217,9 +230,9 @@ jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
217230
If you encounter issues or have improvements for this guide, please:
218231

219232
1. Open an issue on the MaxText repository
220-
2. Submit a pull request with your improvements
221-
3. Share your experience in the discussions
233+
1. Submit a pull request with your improvements
234+
1. Share your experience in the discussions
222235

223-
---
236+
______________________________________________________________________
224237

225-
**Happy Training! 🚀**
238+
**Happy Training! 🚀**

pedagogical_examples/__init__.py

Lines changed: 0 additions & 13 deletions
This file was deleted.

0 commit comments

Comments
 (0)