- High: It blocks me to complete my task.
I’ve created a small cluster on a single machine using docker-compose which consists of a head node and a worker node. I need to persist the logs after the cluster is restarted. I mean, I expect that when I run docker-compose down && docker-compose up -d
all the previous logs will be preserved. Unfortunately, this is not the case. I tried to solve this issue by connecting Ray to a Redis as described here. After doing so, the status of previous jobs is preserved after restarting the cluster. However, I can’t access the job logs (It says “Failed to load”). This is a vital issue for my team and your help and suggestions are appreciated.
docker-compose.yaml file:
version: "3"
services:
ray-head:
build: .
ports:
- "${REDISPORT}:${REDISPORT}"
- "${DASHBOARDPORT}:${DASHBOARDPORT}"
- "${HEADNODEPORT}:${HEADNODEPORT}"
env_file:
- .env
command: bash -c "ray start --head --dashboard-port=${DASHBOARDPORT} --port=${REDISPORT} --dashboard-host=0.0.0.0 --redis-password=${REDISPASSWORD} --block"
shm_size: 3g
deploy:
resources:
limits:
cpus: '1'
memory: '4g'
networks:
- chatbot-network
volumes:
- ray_volume:/home/ray
- $SSH_AUTH_SOCK:/ssh-agent
environment:
- SSH_AUTH_SOCK=/ssh-agent
ray-worker:
build: .
depends_on:
- ray-head
env_file:
- .env
command: bash -c "ray start --address=ray-head:${REDISPORT} --redis-password=${REDISPASSWORD} --num-cpus=${NUM_CPU_WORKER} --block"
shm_size: 3g
deploy:
mode: replicated
replicas: ${NUM_WORKERS}
resources:
limits:
cpus: ${NUM_CPU_WORKER}
memory: '4g'
networks:
- chatbot-network
volumes:
- ray_volume:/home/ray
- $SSH_AUTH_SOCK:/ssh-agent
environment:
- SSH_AUTH_SOCK=/ssh-agent
networks:
chatbot-network:
name: chatbot-network
external: true
volumes:
ray_volume: