About the Ray Train category

For questions on distributed model training. Use Train to scale a model’s training process across CPUs, GPUs, or multiple nodes.

Ray Train is a scalable machine learning library for distributed training and fine-tuning.

Ray Train allows you to scale model training code from a single machine to a cluster of machines in the cloud, and abstracts away the complexities of distributed computing. Whether you have large models or large datasets, Ray Train is the simplest solution for distributed training.

Ray Train provides support for many frameworks:

PyTorch Ecosystem More Frameworks
PyTorch TensorFlow
PyTorch Lightning Keras
Hugging Face Transformers Horovod
Hugging Face Accelerate XGBoost
DeepSpeed LightGBM

Documentation: