I am not very familliar with Kubernetes and cannot solve this myself.
I get this error: RuntimeError: Redis has started but no raylets have registered yet.
when calling ray up
.
What may cause such an error?
I assume there is a problem with my minikube configuration, because everything seems to be working fine with a remote Kubernetes Cluster I’m using.
Here is full output of calling ray up
:
ray up tune-config-0x5iz566.yaml
Cluster: example-cluster
Loaded cached provider configuration
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
No head node found. Launching a new cluster. Confirm [y/N]: y
Acquiring an up-to-date head node
2021-06-18 08:05:35,786 INFO node_provider.py:137 -- KubernetesNodeProvider: calling create_namespaced_pod (count=1).
Launched a new head node
Fetching the new head node
<1/1> Setting up head node
Prepared bootstrap config
2021-06-18 08:05:35,807 INFO node_provider.py:103 -- KubernetesNodeProvider: Caught a 409 error while setting node tags. Retrying...
New status: waiting-for-ssh
[1/7] Waiting for SSH to become available
Running `uptime` as a test.
2021-06-18 08:05:36,324 INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-x2qrv: Running kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
error: unable to upgrade connection: container not found ("ray-node")
SSH still not available (Exit Status 1): kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2021-06-18 08:05:41,385 INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-x2qrv: Running kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
23:05:41 up 3 days, 19:40, 0 users, load average: 0.72, 0.92, 0.90
Success.
Updating cluster configuration. [hash=8315d8e8a4262f294254e35e5e3ac8c2d107872c]
New status: syncing-files
[2/7] Processing file mounts
2021-06-18 08:05:41,720 INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-x2qrv: Running kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (mkdir -p ~)'
[3/7] No worker file mounts to sync
New status: setting-up
[4/7] No initialization commands to run.
[5/7] Initalizing command runner
[6/7] No setup commands to run.
[7/7] Starting the Ray runtime
2021-06-18 08:05:42,210 INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-x2qrv: Running kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"memory":751619276}'"'"';ray stop)'
Did not find any active Ray processes.
2021-06-18 08:05:42,941 INFO command_runner.py:172 -- NodeUpdater: example-cluster-ray-head-x2qrv: Running kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"memory":751619276}'"'"';ulimit -n 65536; ray start --head --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)'
Local node IP: 172.17.0.5
2021-06-17 23:05:44,862 INFO services.py:1274 -- View the Ray dashboard at http://172.17.0.5:8265
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1808, in main
return cli()
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 581, in start
ray_params, head=True, shutdown_at_exit=block, spawn_reaper=block)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/node.py", line 247, in __init__
log_warning=False))
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 310, in get_address_info_from_redis
redis_address, node_ip_address, redis_password=redis_password)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 269, in get_address_info_from_redis_helper
"Redis has started but no raylets have registered yet.")
RuntimeError: Redis has started but no raylets have registered yet.
command terminated with exit code 1
New status: update-failed
!!!
Setup command `kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"memory":751619276}'"'"';ulimit -n 65536; ray start --head --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)'` failed with exit code 1. stderr:
!!!
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/eg4l/venvs/ray37/lib/python3.7/site-packages/ray/autoscaler/_private/updater.py", line 134, in run
self.do_update()
File "/home/eg4l/venvs/ray37/lib/python3.7/site-packages/ray/autoscaler/_private/updater.py", line 468, in do_update
run_env="auto")
File "/home/eg4l/venvs/ray37/lib/python3.7/site-packages/ray/autoscaler/_private/command_runner.py", line 178, in run
self.process_runner.check_call(final_cmd, shell=True)
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'kubectl -n ray exec -it example-cluster-ray-head-x2qrv -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":1,"memory":751619276}'"'"';ulimit -n 65536; ray start --head --autoscaling-config=~/ray_bootstrap_config.yaml --dashboard-host 0.0.0.0)'' returned non-zero exit status 1.
Failed to setup head node.