Error while trying to connec to ray cluster from docker

My example-full.yaml file :

cluster_name: default

docker:
    image: "rayproject/ray-ml:latest-gpu"
    container_name: "ray_container"
    disable_shm_size_detection: True
    pull_before_run: True
    run_options: []

provider:
    type: local
    head_ip: <host>
    worker_ips: [<host>:<port>]
   
auth:
    ssh_user: mrityunjoysaha
    # ssh_private_key: ~/.ssh/id_rsa

min_workers: 0

max_workers: 0
upscaling_speed: 1.0

idle_timeout_minutes: 5

file_mounts: {
}

cluster_synced_files: []

file_mounts_sync_continuously: False

rsync_exclude:
    - "**/.git"
    - "**/.git/**"

rsync_filter:
    - ".gitignore"

initialization_commands: []

setup_commands: [conda install -c anaconda python=3.8]

head_setup_commands: []

worker_setup_commands: []

head_start_ray_commands:
    - ray stop
    - ulimit -c unlimited && ray start --head --port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0

worker_start_ray_commands:
    - ray stop
    - ray start --address=$RAY_HEAD_IP:6379

Then I execute :

ray up example-full.yaml

My output :

Local node IP: <host>
2021-07-08 07:39:31,089	INFO services.py:1274 -- View the Ray dashboard at http://127.0.0.1:8265

--------------------
Ray runtime started.
--------------------

Next steps
  To connect to this Ray runtime from another node, run
    ray start --address='<host>:6379' --redis-password='xxxx'
  
  Alternatively, use the following Python code:
    import ray
    ray.init(address='auto', _redis_password='xxxx')
  
  If connection fails, check your firewall settings and network configuration.
  
  To terminate the Ray runtime, run
    ray stop
Shared connection to <host> closed.
2021-07-08 20:09:32,239	INFO node_provider.py:101 -- ClusterState: Writing cluster state: ['<host>:<port>', '<host>']
  New status: up-to-date

Useful commands
  Monitor autoscaling with
    ray exec /home/mrityunjoysaha/mrityunjoy/ray_cluster_testing/example-full.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
  Connect to a terminal on the cluster head:
    ray attach /home/mrityunjoysaha/mrityunjoy/ray_cluster_testing/example-full.yaml
  Get a remote shell to the cluster manually:
    ssh -tt -o IdentitiesOnly=yes mrityunjoysaha@<host> docker exec -it ray_container /bin/bash

The i ran a python script written with fastapi and in docker, which starts like this :

@app.on_event("startup")
async def startup_event():
    ray.init(address='<host>:6379', _redis_password='xxxx')
    global client
    client = serve.start()

But it’s getting below error :

  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 526, in lifespan
    async for item in self.lifespan_context(app):
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 467, in default_lifespan
    await self.startup()
  File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 502, in startup
    await handler()
  File "./master.py", line 84, in startup_event
    ray.init(address='<host>:6379' , _redis_password='xxxx')
  File "/usr/local/lib/python3.8/dist-packages/ray/worker.py", line 759, in init
    _global_node = ray.node.Node(
  File "/usr/local/lib/python3.8/dist-packages/ray/node.py", line 176, in __init__
    ray._private.services.get_address_info_from_redis(
  File "/usr/local/lib/python3.8/dist-packages/ray/_private/services.py", line 287, in get_address_info_from_redis
    return get_address_info_from_redis_helper(
  File "/usr/local/lib/python3.8/dist-packages/ray/_private/services.py", line 246, in get_address_info_from_redis_helper
    client_table = global_state.node_table()
  File "/usr/local/lib/python3.8/dist-packages/ray/state.py", line 323, in node_table
    node_info["Resources"] = self.node_resource_table(
  File "/usr/local/lib/python3.8/dist-packages/ray/state.py", line 284, in node_resource_table
    node_id = ray.NodeID(hex_to_binary(node_id))
  File "python/ray/includes/unique_ids.pxi", line 207, in ray._raylet.NodeID.__init__
  File "python/ray/includes/unique_ids.pxi", line 33, in ray._raylet.check_id
ValueError: ID string needs to have length 20

Can anyone please suggest what I might be doing wrong. Thanks in advance

looks like some ray internals are involved
any sense @sangcho ?

Hmm it is probably some kind of version mismatch. The ID string needs to be length 28 not 20 in the latest versions.

@Dmitri do you have any guess that makes this possible?

I think some details on how the script was deployed could be helpful to find out where there might be an opportunity for version mismatches.