Hi Ray community,
I’m exploring the integration of Kuberay (Ray on Kubernetes) with DeepSpeed for large-scale distributed model training, but noticed a significant gap: While Kuberay + vLLM workflows are well-documented and mature (e.g., for high-throughput inference with autoscaling and multi-GPU support38), DeepSpeed integrations seem almost nonexistent. Is combining them a viable idea? Or could anyone share experiences/advice on co-deploying Ray and DeepSpeed in Kubernetes?
Thanks