cdwiv
October 17, 2023, 8:45pm
1
How severe does this issue affect your experience of using Ray?
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Do AWS spot instances work out of the box with kuberay and ray data? Or is there some additional configuration required to get that working properly
chengsu
October 18, 2023, 10:57pm
2
Speaking of Ray Data, it uses fault tolerance mechanism from Ray Core - Ray Data Internals — Ray 2.7.1 . So mostly it should be working.
cc @Kai-Hsun_Chen for KubeRay. Thanks.
cdwiv
October 19, 2023, 12:39am
3
Thanks for getting back to me @chengsu
Fault tolerance isn’t supported if the process that created the Dataset
dies.
If we are running a ray job, where does the process that created the [Dataset
] live? Is it on the head node?
Do AWS spot instances work out of the box with kuberay and ray data?
KubeRay doesn’t perform any additional work for spot instance support. However, you could try placing the head Pod on an on-demand node and the worker Pods on spot instances.