added sweep info and better logging#5
Conversation
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| values: [3000, 7000] | ||
|
|
||
| reward_ideal_max_length: | ||
| values: [12000, 20000] |
There was a problem hiding this comment.
Bug: Invalid length reward parameter combos in Bayesian sweep
The length reward parameters are configured for independent sampling by the Bayesian sweep, which creates invalid combinations where reward_ideal_min_length can be less than reward_min_length or reward_ideal_max_length can exceed reward_max_length. The comments indicate two intended combinations, but the configuration allows all permutations of the four parameters.
| trainer = self._trainer_ref | ||
| else: | ||
| print("[EvalCallback] Warning: Cannot access trainer, skipping evaluation") | ||
| return |
There was a problem hiding this comment.
Bug: Trainer reference fallback mishandles missing kwargs
The fallback to _trainer_ref only happens when model exists in kwargs. If neither trainer nor model are in kwargs, the code continues with trainer=None instead of checking _trainer_ref, causing run_with_trainer to be called with None despite set_trainer having been called to set _trainer_ref.
Note
Adds an in-training evaluation pipeline (vLLM + plasmidkit), switches W&B project, enables S3 checkpointing, tunes GRPO/reward params, and introduces a sweep config.
src/runners/grpo.py,src/runners/grpo_sweep.py):EvaluatorandEvalCallbackusing the trainer's vLLM model; log results and artifacts to W&B.Config(learning rate, batch size, generations, temperature, top_p, beta, epsilon)./s3/${checkpoints_path}/..., add write test (test_checkpoint_directory_write), limit saved checkpoints, and save/log final artifacts.eval_steps=50) and enhance W&B metadata (tags, config logging).src/eval/eval.py:SequenceAnalyzer(plasmidkit-based annotation merge/extract) andEvaluator(prompt loading, rollout generation, analysis).src/eval/eval_config.pyfor evaluation settings (prompts, sampling, overlap threshold, logging).src/utils/training_utils.py(EvalRunner,EvalCallback, checkpoint write test, W&B logging/artifacts).src/config.py):checkpoints_pathand production GRPO hyperparameters for reuse.src/rewards/bioinformatics/reward_config.py):violation_penalty_factor→1.0,ori_weight→1.5.docker-compose.yaml):ucl-cssb/PlasmidRL.sweeps/configs/sweep_config_training_with_eval.yamldefining a Bayes sweep with length-based rewards and integrated evaluation.Written by Cursor Bugbot for commit 5be00c8. This will update automatically on new commits. Configure here.