Setting up docker as a virtual Ray cluster

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I’m trying to set up a virtual Ray cluster to learn and demonstrate various features of Ray. I could do this stand-alone mode or set up actual virtual machines, but using docker is so much more convenient (if it worked).

Here is what I’m doing:
I have a very simple docker image. I just installed a couple of utilities to make debugging easier:

FROM continuumio/miniconda3
RUN apt update && apt install -y iputils-ping iproute2
RUN pip install "ray[all]"

I create my own network, but please note that I have also tried this without creating this cluster: docker network create simulated-cluster

I start the head node via this:

docker run \
    -dit \
    --network simulated-cluster \
    -p 6379:6379 -p 8265:8265 -p 10001:10001 -p 10002:10002 \
    --name ray-head \
test-ray-image \
    ray start --head --node-ip-address=0.0.0.0 --dashboard-host=0.0.0.0 --disable-usage-stats --block

I confirm that this works and get its internal IP, which is always 172.18.0.2

I then start worker nodes:

!docker run -dit --network simulated-cluster --name ray-worker1 test-ray-image ray start --address=ray-head:6379 --block
!docker run -dit --network simulated-cluster --name ray-worker2 test-ray-image ray start --address=ray-head:6379 --block
!docker run -dit --network simulated-cluster --name ray-worker3 test-ray-image ray start --address=ray-head:6379 --block

If I’m not using the simulated network, then worker nodes point to the IP I retrieved via addr ip.

Logs seem to show that worker nodes are running ok as well. What’s more, I can access the dashboard via http://localhost:8265!

It seems to me that the network is up! I now tried to do some computation on it. Note that the next bit of code is being run on the host machine, not inside any of the docker containers:

ray.init("ray://localhost:6379")

@ray.remote
def test():
    return "Hello from Ray!"

The the host name, I have tried localhost, the IP from addr ip, the port 10001 and countless other permutations but I just can’t get it to work. Sometimes this code just sits there forever, in this case, I have been getting connection timeouts.

Any ideas what I could be doing wrong?

Hi @Shahbaz_Chaudhary, Ray client is generally discouraged. Can you run your script inside the docker container?