How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi there,
we have a server, where we simultaneously run multiple ray clusters.
Clusters are started like this:
ray start --head --num-gpus=0 --temp-dir=/tmp/ray --port=45521 --dashboard-port=40925 --ray-client-server-port=52097
If I have one cluster running, I can easily submit a job via:
ray job submit --no-wait --address=http://127.0.0.1:40925/ -- python ray_cluster_example.py
and the job runs without problems.
If I have two clusters running at the same time (with distinct ports of course), I run into the following error upon submitting a job to one of them:
Job submission server address: http://127.0.0.1:36403
Traceback (most recent call last):
File "/home/aaa/dev/miniconda3/envs/xxx/bin/ray", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/scripts/scripts.py", line 2612, in main
return cli()
^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/autoscaler/_private/cli_logger.py", line 856, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/dashboard/modules/job/cli.py", line 273, in submit
job_id = client.submit_job(
^^^^^^^^^^^^^^^^^^
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/dashboard/modules/job/sdk.py", line 254, in submit_job
self._raise_error(r)
File "/home/aaa/dev/miniconda3/envs/xxx/lib/python3.11/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
raise RuntimeError(
RuntimeError: Request failed with status code 500: No available agent to submit job, please try again later..
Note that the contents of the python file do not matter, because this error happens before it is even called.
Is this a bug, or am I missing something?
Thanks!