How to start multiple ray instances on one machine with `ray.init()`?

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

How can I start multiple ray instances on one machine with ray.init()?

info

  • ray 2.23.0
  • Python 3.9.19
  • Ubuntu 20.04 focal

repro steps

  1. In the Python REPL, start a new ray instance.

    import ray
    ray.init(address="local")
    
    2024-07-10 12:21:16,847	INFO worker.py:1582 -- Calling ray.init() again after it has already been called.
    RayContext(dashboard_url='127.0.0.1:8265', python_version='3.9.19', ray_version='2.23.0', ray_commit='a0947ead5cd94b3d8ca5cdeb9422dccb12d03867')
    
  2. Calling ray.init() again doesn’t start a new instance (note the same dashboard_url in the output) even though the ray.init() docs here says it should.

    1. If the provided address is “local”, start a new local Ray instance, even if there is already an existing local Ray instance.
    ray.init(address="local", ignore_reinit_error=True)
    
    2024-07-10 12:38:06,593	INFO worker.py:1582 -- Calling ray.init() again after it has already been called.
    RayContext(dashboard_url='127.0.0.1:8265', python_version='3.9.19', ray_version='2.23.0', ray_commit='a0947ead5cd94b3d8ca5cdeb9422dccb12d03867')
    
  3. Using a different dashboard_port also doesn’t work. Same dashboard_url returned.

    ray.init(address="local", ignore_reinit_error=True, dashboard_port=8266)
    
    2024-07-10 12:21:16,847	INFO worker.py:1582 -- Calling ray.init() again after it has already been called.
    RayContext(dashboard_url='127.0.0.1:8265', python_version='3.9.19', ray_version='2.23.0', ray_commit='a0947ead5cd94b3d8ca5cdeb9422dccb12d03867')
    
  4. Running ray.init() in a new, concurrent Python REPL successfully starts new Ray instance (new 8266 port), but it’s broken because it doesn’t have any “available agent.”

    import ray; ray.init(address="local")
    
    2024-07-10 13:17:11,203	INFO worker.py:1740 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8266
    RayContext(dashboard_url='127.0.0.1:8266', python_version='3.9.19', ray_version='2.23.0', ray_commit='a0947ead5cd94b3d8ca5cdeb9422dccb12d03867')
    

    Try to submit job to second Ray instance and see it fails.

    echo 'print("Hello, World!")' >> test.py
    
    ray job submit --address=http://127.0.0.1:8266/ -- python test.py
    
    Job submission server address: http://127.0.0.1:8266
    Traceback (most recent call last):
      File "/home/dxia/.pyenv/versions/hsdk/bin/ray", line 8, in <module>
        sys.exit(main())
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-    packages/ray/scripts/scripts.py", line 2612, in main
        return cli()
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
        return self.main(*args, **kwargs)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 1078, in main
        rv = self.invoke(ctx)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/click/core.py", line 783, in invoke
        return __callback(*args, **kwargs)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
        return func(*args, **kwargs)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/ray/autoscaler/_private/cli_logger.py", line 856, in wrapper
        return f(*args, **kwargs)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/ray/dashboard/modules/job/cli.py", line 273, in submit
        job_id = client.submit_job(
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/ray/dashboard/modules/job/sdk.py", line 254, in submit_job
        self._raise_error(r)
      File "/home/dxia/.pyenv/versions/3.9.19/envs/hsdk/lib/python3.9/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
        raise RuntimeError(
    RuntimeError: Request failed with status code 500: No available agent to submit job, please try again later..
    

Questions

  1. How can I start multiple, working Ray instances in the same Python session? There’s a way described here with the CLI command ray start, but what’s the SDK equivalent of these commands especially for CLI switches like --dashboard-agent-listen-port?
  2. How can I start multiple, working Ray instances in different Python sessions?
  3. Is the ray.init() documentation here correct? Should it mention something about different sessions as a requirement?
    1. If the provided address is “local”, start a new local Ray instance, even if there is already an existing local Ray instance.