W&B Logging support for Finetuning#815
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces two major features: multi-GPU training support using DistributedDataParallel (DDP) and experiment tracking with Weights & Biases. The DDP implementation is well-structured, handling process group initialization, data sampling, and metric synchronization correctly. The introduction of a logging protocol with a W&B implementation is a great addition for experiment tracking. I've identified a critical issue regarding DDP support for multiple models, which will cause a crash. I've also made a couple of medium-severity suggestions to improve code clarity and documentation.
|
@psinger-prior whats the state of this PR? |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Issue
Closes #810
Motivation and Context
Tracking training runs is important, implemented a logging class that can be expanded for new logger support. First support is for W&B.
This PR builds on top of #812 and should only merged afterwards.
Public API Changes
How Has This Been Tested?
Checklist
changelog/README.md), or "no changelog needed" label requested.