Skip to content

added sweep info and better logging#5

Merged
McClain-Thiel merged 1 commit into
rewards-writtenfrom
eval-callback
Nov 7, 2025
Merged

added sweep info and better logging#5
McClain-Thiel merged 1 commit into
rewards-writtenfrom
eval-callback

Conversation

@McClain-Thiel
Copy link
Copy Markdown
Collaborator

@McClain-Thiel McClain-Thiel commented Nov 7, 2025

Note

Adds an in-training evaluation pipeline (vLLM + plasmidkit), switches W&B project, enables S3 checkpointing, tunes GRPO/reward params, and introduces a sweep config.

  • Training/GRPO (src/runners/grpo.py, src/runners/grpo_sweep.py):
    • Integrate evaluation via Evaluator and EvalCallback using the trainer's vLLM model; log results and artifacts to W&B.
    • Adopt production hyperparameters from Config (learning rate, batch size, generations, temperature, top_p, beta, epsilon).
    • Configure S3 checkpointing under /s3/${checkpoints_path}/..., add write test (test_checkpoint_directory_write), limit saved checkpoints, and save/log final artifacts.
    • Increase eval frequency (eval_steps=50) and enhance W&B metadata (tags, config logging).
  • Evaluation Framework:
    • Add src/eval/eval.py: SequenceAnalyzer (plasmidkit-based annotation merge/extract) and Evaluator (prompt loading, rollout generation, analysis).
    • Add src/eval/eval_config.py for evaluation settings (prompts, sampling, overlap threshold, logging).
    • Add generic training utilities in src/utils/training_utils.py (EvalRunner, EvalCallback, checkpoint write test, W&B logging/artifacts).
  • Config (src/config.py):
    • Add checkpoints_path and production GRPO hyperparameters for reuse.
  • Rewards (src/rewards/bioinformatics/reward_config.py):
    • Adjust defaults: violation_penalty_factor1.0, ori_weight1.5.
  • Orchestration (docker-compose.yaml):
    • Switch W&B entity/project for GRPO training and sweep to ucl-cssb/PlasmidRL.
  • Sweeps:
    • Add sweeps/configs/sweep_config_training_with_eval.yaml defining a Bayes sweep with length-based rewards and integrated evaluation.

Written by Cursor Bugbot for commit 5be00c8. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

values: [3000, 7000]

reward_ideal_max_length:
values: [12000, 20000]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Invalid length reward parameter combos in Bayesian sweep

The length reward parameters are configured for independent sampling by the Bayesian sweep, which creates invalid combinations where reward_ideal_min_length can be less than reward_min_length or reward_ideal_max_length can exceed reward_max_length. The comments indicate two intended combinations, but the configuration allows all permutations of the four parameters.

Fix in Cursor Fix in Web

trainer = self._trainer_ref
else:
print("[EvalCallback] Warning: Cannot access trainer, skipping evaluation")
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Trainer reference fallback mishandles missing kwargs

The fallback to _trainer_ref only happens when model exists in kwargs. If neither trainer nor model are in kwargs, the code continues with trainer=None instead of checking _trainer_ref, causing run_with_trainer to be called with None despite set_trainer having been called to set _trainer_ref.

Fix in Cursor Fix in Web

@McClain-Thiel McClain-Thiel merged commit 441e498 into rewards-written Nov 7, 2025
1 of 2 checks passed
@McClain-Thiel McClain-Thiel deleted the eval-callback branch April 20, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant