Only Ray CLI In Container - Exited Code 0

I’ve recently created the following Dockerfile:

FROM rayproject/ray:1.3.0

ENV RAY_REDIS_PORT=6379

WORKDIR /Ray

CMD ray start --head --dashboard-host 0.0.0.0 --port=${RAY_REDIS_PORT} 

Then created a local image using the same Dockerfile. In attempting to run the resulting container, I get the typical ray startup output and then the docker container exits as it is provided from ray an exit code 0.

What is the recommended way to keep the docker container running after ray successfully initiates and provides the container with an exit code 0?

1 Like

What docker command are you using to create the container?

I’ve used both this docker compose file via docker compose up:

version: '3'

services: 
    network_test_1:
        image: test_1
        environment:
            RAY_CONNECTION_URL_AND_PORT: "ray:6379"
        networks:
            - ray_connection
    ray:
        image: ray2
        ports: 
            - 8265:8265
        networks:
            - ray_connection
        shm_size: '1.56gb'

networks:
    ray_connection:
        name: ray_connection

Where the ray2 image is from the Dockerfile above.

I have also used docker run ray2, docker run -d ray2, and docker run -it -d ray2, none of which has kept the container running after the setup process.

I’ve used this docker compose file before:

version: "3.9"  # optional since v1.27.0
services:
    ray_head:
        image: $DOCKER_IMAGE
        networks: [ray_local]
        command: bash -c "ray start --head --port=6379 --object-manager-port=8076 --dashboard-host 0.0.0.0 --num-cpus 4 && sleep 10000"
        ports:
            - 6379:6379
            - 8265:8265
            - 10002:10001
#         volumes:
#             - /home/ubuntu/coding/ray:/ray
#             - /home/ubuntu/coding/xgboost_ray:/xgboost_ray
#             - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
#             - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
        deploy:
            resources:
                limits:
                    cpus: 8
                    memory: 2500M
                reservations:
                    memory: 2500M
        shm_size: 1200m

    ray_worker_1:
        image: $DOCKER_IMAGE
        networks: [ray_local]
        depends_on:
            - ray_head
        command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
#         volumes:
 #            - /home/ubuntu/coding/ray:/ray:ro
 #            - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
 #            - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
        mem_limit: 3000m
        mem_reservation: 3000m
        shm_size: 1200m

    ray_worker_2:
        image: $DOCKER_IMAGE
        networks: [ray_local]
        depends_on:
            - ray_head
        command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
#         volumes:
#             - /home/ubuntu/coding/ray:/ray:ro
#             - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
#             - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
        mem_limit: 3000m
        mem_reservation: 3000m
        shm_size: 1200m

    ray_worker_3:
        image: $DOCKER_IMAGE
        networks: [ray_local]
        depends_on:
            - ray_head
        command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
#         volumes:
#             - /home/ubuntu/coding/ray:/ray:ro
#             - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
#             - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
        mem_limit: 3000m
        mem_reservation: 3000m
        shm_size: 1200m


networks:
    ray_local:

So it seems you keep the container persisting via the sleep 10000 then?

Yeah, that’s right. The sleep will keep the node up.