added sweep info and better logging by McClain-Thiel · Pull Request #5 · ucl-cssb/PlasmidRL

McClain-Thiel · 2025-11-07T14:08:32Z

Note

Adds an in-training evaluation pipeline (vLLM + plasmidkit), switches W&B project, enables S3 checkpointing, tunes GRPO/reward params, and introduces a sweep config.

Training/GRPO (src/runners/grpo.py, src/runners/grpo_sweep.py):
- Integrate evaluation via Evaluator and EvalCallback using the trainer's vLLM model; log results and artifacts to W&B.
- Adopt production hyperparameters from Config (learning rate, batch size, generations, temperature, top_p, beta, epsilon).
- Configure S3 checkpointing under /s3/${checkpoints_path}/..., add write test (test_checkpoint_directory_write), limit saved checkpoints, and save/log final artifacts.
- Increase eval frequency (eval_steps=50) and enhance W&B metadata (tags, config logging).
Evaluation Framework:
- Add src/eval/eval.py: SequenceAnalyzer (plasmidkit-based annotation merge/extract) and Evaluator (prompt loading, rollout generation, analysis).
- Add src/eval/eval_config.py for evaluation settings (prompts, sampling, overlap threshold, logging).
- Add generic training utilities in src/utils/training_utils.py (EvalRunner, EvalCallback, checkpoint write test, W&B logging/artifacts).
Config (src/config.py):
- Add checkpoints_path and production GRPO hyperparameters for reuse.
Rewards (src/rewards/bioinformatics/reward_config.py):
- Adjust defaults: violation_penalty_factor → 1.0, ori_weight → 1.5.
Orchestration (docker-compose.yaml):
- Switch W&B entity/project for GRPO training and sweep to ucl-cssb/PlasmidRL.
Sweeps:
- Add sweeps/configs/sweep_config_training_with_eval.yaml defining a Bayes sweep with length-based rewards and integrated evaluation.

^{Written by Cursor Bugbot for commit 5be00c8. This will update automatically on new commits. Configure here.}

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-11-07T14:10:39Z

+    values: [3000, 7000]
+
+  reward_ideal_max_length:
+    values: [12000, 20000]


Bug: Invalid length reward parameter combos in Bayesian sweep

The length reward parameters are configured for independent sampling by the Bayesian sweep, which creates invalid combinations where reward_ideal_min_length can be less than reward_min_length or reward_ideal_max_length can exceed reward_max_length. The comments indicate two intended combinations, but the configuration allows all permutations of the four parameters.

cursor · 2025-11-07T14:10:39Z

+                        trainer = self._trainer_ref
+                    else:
+                        print("[EvalCallback] Warning: Cannot access trainer, skipping evaluation")
+                        return


Bug: Trainer reference fallback mishandles missing kwargs

The fallback to _trainer_ref only happens when model exists in kwargs. If neither trainer nor model are in kwargs, the code continues with trainer=None instead of checking _trainer_ref, causing run_with_trainer to be called with None despite set_trainer having been called to set _trainer_ref.

added sweep info and better logging

5be00c8

cursor Bot reviewed Nov 7, 2025

View reviewed changes

McClain-Thiel merged commit 441e498 into rewards-written Nov 7, 2025
1 of 2 checks passed

McClain-Thiel deleted the eval-callback branch April 20, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added sweep info and better logging#5

added sweep info and better logging#5
McClain-Thiel merged 1 commit into
rewards-writtenfrom
eval-callback

McClain-Thiel commented Nov 7, 2025 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Nov 7, 2025

Uh oh!

cursor Bot Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

McClain-Thiel commented Nov 7, 2025 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor Bot Nov 7, 2025

Choose a reason for hiding this comment

Bug: Invalid length reward parameter combos in Bayesian sweep

Uh oh!

cursor Bot Nov 7, 2025

Choose a reason for hiding this comment

Bug: Trainer reference fallback mishandles missing kwargs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

McClain-Thiel commented Nov 7, 2025 •

edited by cursor Bot

Loading