Ray is best for scenarios when there are needs for scaling up.
I don’t think distributed training is necessarily faster than non distributed. If dataset is small and there is only one machine and you are using cpu, tensorflow non distributed is more efficient and you probably don’t need ray.
For distributed training, you are paying network communication cost for ring operation in order to spread computation load across multiple nodes. It only makes sense if the latter weighs more than the former. Ray AIR makes distributed training easier but you have to assess if you need distributed training to start with.