Skip to content

Conversation

@SamuelMarks
Copy link
Collaborator

@SamuelMarks SamuelMarks commented Dec 11, 2025

Description

Contributions by both @SamuelMarks and @A9isha to address the following:

  1. Implementation to type the RL API using the more explicit, typed, pydantic syntax.
  2. Fix broken rl_llama3_demo.ipynb because of vllm.yml not being found from the colab because of MAXTEXT_PKG_DIR
  3. Mitigate the issue that the logs generated from vllm-tpu which floods the output of the notebook such that the evaluation accuracy numbers are not visible in the notebook. We can take a remove these mitigation steps once b/473703277 is resolved.

Tests

Ran rl_llama3_demo.ipynb for both grpo and gspo-token on a v5p-8

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link

codecov bot commented Jan 5, 2026

Codecov Report

❌ Patch coverage is 0% with 37 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/MaxText/rl/train_rl.py 0.00% 21 Missing ⚠️
src/MaxText/max_logging.py 0.00% 9 Missing ⚠️
src/MaxText/rl/evaluate_rl.py 0.00% 4 Missing ⚠️
src/MaxText/rl/utils_rl.py 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@SamuelMarks
Copy link
Collaborator Author

@A9isha let me know if you need a hand

@A9isha A9isha force-pushed the rl-api branch 2 times, most recently from 9be9fb4 to bb2e146 Compare January 6, 2026 08:13
@A9isha
Copy link
Collaborator

A9isha commented Jan 6, 2026

@A9isha let me know if you need a hand

@SamuelMarks thanks a lot for the offer to help!

do you think it is possible to make the following more pythonic?

config_argv = [
    "",  # argv[0] placeholder
    config_file,
    f"model_name={MODEL_NAME}",
    f"tokenizer_path={HF_REPO_ID}",
    f"run_name={RUN_NAME}",
    f"chat_template_path={CHAT_TEMPLATE_PATH}",
    f"load_parameters_path={MODEL_CHECKPOINT_PATH}",
    f"base_output_directory={OUTPUT_DIRECTORY}",
    f"hf_access_token={HF_TOKEN}",
    f"learning_rate={LEARNING_RATE}",
    f"steps={STEPS}",
    f"grpo.num_generations={NUM_GENERATIONS}",
    f"grpo.grpo_beta={GRPO_BETA}",
    f"grpo.grpo_epsilon={GRPO_EPSILON}",
    f"chips_per_vm={CHIPS_PER_VM}",
    f"grpo.loss_algo={LOSS_ALGO}",
    "use_pathways=False"
]

to become something like

config = mt_types.MaxTextConfig.get_configs(run_name="foo", 
   learning_rate=3e-6, 
   ...., 
   GRPO(grpo_beta=0.2, num_iteration=4),
   ....,
   

)

@SamuelMarks
Copy link
Collaborator Author

@A9isha Sure, in #2775 I introduced a split of the initialize function in two. So now you can use initialize_pydantic. To go further, you can change the input also, e.g., to:

from types import (
    MaxTextConfig,
    RunInfo,
    Tokenizer,
    Checkpointing,
    HfDataset,
    Optimizer,
    TrainingLoop,
    GRPO,
    RLHardware
)

# 1. Construct semantic groupings using specific subclasses
run_config = RunInfo(
    base_config=config_file,
    model_name=MODEL_NAME,
    run_name=RUN_NAME,
    base_output_directory=OUTPUT_DIRECTORY
)

tokenizer_config = Tokenizer(
    tokenizer_path=HF_REPO_ID,
    chat_template_path=CHAT_TEMPLATE_PATH
)

checkpoint_config = Checkpointing(
    load_parameters_path=MODEL_CHECKPOINT_PATH
)

dataset_config = HfDataset(
    hf_access_token=HF_TOKEN
)

optimizer_config = Optimizer(
    learning_rate=LEARNING_RATE
)

training_config = TrainingLoop(
    steps=STEPS
)

# Note: The 'grpo.' prefix from argv is removed because these are direct fields of the GRPO class
grpo_config = GRPO(
    num_generations=NUM_GENERATIONS,
    grpo_beta=GRPO_BETA,
    grpo_epsilon=GRPO_EPSILON,
    loss_algo=LOSS_ALGO
)

hardware_config = RLHardware(
    chips_per_vm=CHIPS_PER_VM,
    use_pathways=False
)

# 2. Merge into the final MaxTextConfig
# We assume standard defaults for any mixins not explicitly initialized above.
final_config_dict = {}

# Order doesn't strictly matter for dictionaries, but merging strictly 
# ensures the specific overrides defined above are captured.
sub_configs = [
    run_config,
    tokenizer_config,
    checkpoint_config,
    dataset_config,
    optimizer_config,
    training_config,
    grpo_config,
    hardware_config
]

for cfg in sub_configs:
    # exclude_defaults=True allows the final MaxTextConfig to resolve 
    # interactions between defaults if necessary, though exclude_unset=True 
    # is often safer if you want the explicit values provided above.
    final_config_dict.update(cfg.model_dump(exclude_unset=True))

max_text_config = MaxTextConfig(**final_config_dict)

@A9isha A9isha force-pushed the rl-api branch 7 times, most recently from af26177 to 3a8a55a Compare January 7, 2026 10:48
@A9isha
Copy link
Collaborator

A9isha commented Jan 7, 2026

@SamuelMarks thank you for thinking about this. Do you think you could create a separate PR to change the config_argv in rl_llama3_demo.ipynb following your idea such that config_argv is no longer a list but is now similar to

config = mt_types.MaxTextConfig.get_configs(run_name="foo", 
   learning_rate=3e-6, 
   ...., 
   GRPO(grpo_beta=0.2, num_iteration=4),
   ....,
   

)

@SamuelMarks
Copy link
Collaborator Author

@A9isha sure, here you go #2915

PS: I've got some research deadlines so if there are any follow-up reviews required I should have some time on the weekend.

" raise RuntimeError(\"OUTPUT_DIRECTORY is not set\")\n",
" \n",
"os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
"if \"MAXTEXT_PKG_DIR\" not in os.environ:\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need to set this env?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great qs. Basically in the path to vllm.yml we are using MAXTEXT_PKG_DIR and if not set it takes up the current filepath. Now, for colab, it is upto /MaxText/examples which causes problem

@NicoGrande for visibility

vllm_hf_config_path: str = Field("", description="Path to HuggingFace model config for MaxText model.")


class GRPO(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this RL instead?

# Start training

max_logging.log("Starting RL training...")
max_logging.warning("Starting RL training...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be info right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal is to make this logs from train_rl.py show up but filter most of the logs from tpu inference in the notebook's output cell (until this PR these logs from train_rl.py are not displayed for notebooks - which is bad). Problem is that there is a lot of info/warning/logs that are getting printed from tpu-inference. Inspite of using the NoisyLogFilter not all logs are getting filtered out. Additionally, for some reason, these train_rl.py "max_logging.log" is also getting hidden in the notebook. This complication arises out of various import orders of absl/logging (including the indirection of max_logging.py); also, max_logging uses absl while other dependecies use logging; general volume of logs from tpu-inference. b/473703277 resolution would help revert this later.
So, either we could use print or use max_logging.warning (to bump up priority) and it gets shown for sure. wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants