Hey, trying to run a simple ML Model using ray which uses Ray Tune as well. We are using EKS cluster and ray-operator in order to use Ray. Apparently, when we try to run our ML model in jupyter notebook, it fails giving error:
Request can’t be sent because the Ray client has already been disconnected due to an error. Last exception: Failed to reconnect within the reconnection grace period (30s)
Any insights on this? We tried changing our cluster config as well in order to use high cpu and better ec2 instances, but it is still throwing an error. Is there a way we can increase the number of threads? (