I’ve recently created the following Dockerfile:
FROM rayproject/ray:1.3.0
ENV RAY_REDIS_PORT=6379
WORKDIR /Ray
CMD ray start --head --dashboard-host 0.0.0.0 --port=${RAY_REDIS_PORT}
Then created a local image using the same Dockerfile. In attempting to run the resulting container, I get the typical ray startup output and then the docker container exits as it is provided from ray an exit code 0
.
What is the recommended way to keep the docker container running after ray successfully initiates and provides the container with an exit code 0
?
1 Like
What docker command are you using to create the container?
I’ve used both this docker compose file via docker compose up
:
version: '3'
services:
network_test_1:
image: test_1
environment:
RAY_CONNECTION_URL_AND_PORT: "ray:6379"
networks:
- ray_connection
ray:
image: ray2
ports:
- 8265:8265
networks:
- ray_connection
shm_size: '1.56gb'
networks:
ray_connection:
name: ray_connection
Where the ray2
image is from the Dockerfile above.
I have also used docker run ray2
, docker run -d ray2
, and docker run -it -d ray2
, none of which has kept the container running after the setup process.
rliaw
May 21, 2021, 8:45pm
4
I’ve used this docker compose file before:
version: "3.9" # optional since v1.27.0
services:
ray_head:
image: $DOCKER_IMAGE
networks: [ray_local]
command: bash -c "ray start --head --port=6379 --object-manager-port=8076 --dashboard-host 0.0.0.0 --num-cpus 4 && sleep 10000"
ports:
- 6379:6379
- 8265:8265
- 10002:10001
# volumes:
# - /home/ubuntu/coding/ray:/ray
# - /home/ubuntu/coding/xgboost_ray:/xgboost_ray
# - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
# - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
deploy:
resources:
limits:
cpus: 8
memory: 2500M
reservations:
memory: 2500M
shm_size: 1200m
ray_worker_1:
image: $DOCKER_IMAGE
networks: [ray_local]
depends_on:
- ray_head
command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
# volumes:
# - /home/ubuntu/coding/ray:/ray:ro
# - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
# - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
mem_limit: 3000m
mem_reservation: 3000m
shm_size: 1200m
ray_worker_2:
image: $DOCKER_IMAGE
networks: [ray_local]
depends_on:
- ray_head
command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
# volumes:
# - /home/ubuntu/coding/ray:/ray:ro
# - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
# - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
mem_limit: 3000m
mem_reservation: 3000m
shm_size: 1200m
ray_worker_3:
image: $DOCKER_IMAGE
networks: [ray_local]
depends_on:
- ray_head
command: bash -c "sleep 1 && ray start --address=ray_head:6379 --object-manager-port=8076 --num-cpus 4 && sleep 10000"
# volumes:
# - /home/ubuntu/coding/ray:/ray:ro
# - /home/ubuntu/coding/xgboost_ray/xgboost_ray:/home/ray/anaconda3/lib/python3.7/site-packages/xgboost_ray:ro
# - /home/ubuntu/coding/modin/modin:/home/ray/anaconda3/lib/python3.7/site-packages/modin:ro
mem_limit: 3000m
mem_reservation: 3000m
shm_size: 1200m
networks:
ray_local:
So it seems you keep the container persisting via the sleep 10000 then?
rliaw
May 21, 2021, 9:47pm
6
Yeah, that’s right. The sleep will keep the node up.