Title of the talk
Multi-GPU ML training using Pytorch-DDP
Description
As the scale and complexity of deep learning models continue to grow, efficient training strategies have become crucial for accelerating innovation and pushing the boundaries of AI research and deployment. Multi-GPU training has emerged as a game-changer, enabling faster model convergence and the ability to handle larger datasets and models. Among the various approaches available, PyTorch’s Distributed Data Parallel (DDP) stands out as a powerful and efficient solution designed for scalability and performance.
Table of contents
Topics of interest include, but are not limited to:
- Introduction to Pytorch DDP (Distributed Data Parallel)
- Best practices for setting up and using PyTorch DDP for multi-GPU training.
- Practical demo on training a simple neural-network with the MNIST datasets using PyTorch DDP.
Duration (including Q&A)
25 mins
Prerequisites
Nothing is required
Speaker bio
- Amita Sharma (Red hat Openshift AI - Kubeflow Training Team : Technical Project Manager)
- Abhijeet Dhumal (Red hat Openshift AI - Kubeflow Training Team : Engineer) @abhijeet-dhumal
The talk/workshop speaker agrees to