Persist ray job logs after restarting cluster

  • High: It blocks me to complete my task.

I’ve created a small cluster on a single machine using docker-compose which consists of a head node and a worker node. I need to persist the logs after the cluster is restarted. I mean, I expect that when I run docker-compose down && docker-compose up -d all the previous logs will be preserved. Unfortunately, this is not the case. I tried to solve this issue by connecting Ray to a Redis as described here. After doing so, the status of previous jobs is preserved after restarting the cluster. However, I can’t access the job logs (It says “Failed to load”). This is a vital issue for my team and your help and suggestions are appreciated.

docker-compose.yaml file:

version: "3"

services:
  ray-head:
    build: .
    ports:
      - "${REDISPORT}:${REDISPORT}"
      - "${DASHBOARDPORT}:${DASHBOARDPORT}"
      - "${HEADNODEPORT}:${HEADNODEPORT}"
    env_file:
      - .env
    command: bash -c "ray start --head --dashboard-port=${DASHBOARDPORT} --port=${REDISPORT} --dashboard-host=0.0.0.0 --redis-password=${REDISPASSWORD} --block"
    shm_size: 3g
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: '4g'
    networks:
      - chatbot-network
    volumes:
      - ray_volume:/home/ray
      - $SSH_AUTH_SOCK:/ssh-agent
    environment:
      - SSH_AUTH_SOCK=/ssh-agent

  ray-worker:
    build: .
    depends_on: 
      - ray-head
    env_file:
      - .env
    command: bash -c "ray start --address=ray-head:${REDISPORT} --redis-password=${REDISPASSWORD} --num-cpus=${NUM_CPU_WORKER} --block" 
    shm_size: 3g
    deploy:
      mode: replicated
      replicas: ${NUM_WORKERS} 
      resources:
        limits:
          cpus: ${NUM_CPU_WORKER}
          memory: '4g'
    networks:
      - chatbot-network
    volumes:
      - ray_volume:/home/ray
      - $SSH_AUTH_SOCK:/ssh-agent
    environment:
      - SSH_AUTH_SOCK=/ssh-agent

networks:
  chatbot-network:
    name: chatbot-network
    external: true

volumes:
  ray_volume: