[Medium] How to raise 2 replicas for remote access on another cluster?

Dockerfile

CMD ray start --head --dashboard-host=0.0.0.0 \
    --port=$RAY_HEAD_PORT \
    --ray-client-server-port=10001 && \
    python main.py

main.py

if __name__ == "__main__":
    num_workers = 2
    workers = [
        GeneralModel.options(
            num_gpus=0.33, num_cpus=2,
            name=cfg.NAMEACTOR, namespace=cfg.NAMESPACE
        ).remote(cfg.DEVICE, cfg.LANGUAGE)
        for _ in range(num_workers)
    ]
    while True:
        time.sleep(5)

The following error appears:

ValueError: The name general_models (namespace=None) is already taken. Please use a different name or get the existing actor using ray.get_actor(‘general_models’, namespace=‘None’)
head_1 | (base

name argument from actor option must be unique across the same namespace. If you give an different name to each worker, you can make it work I believe.

if __name__ == "__main__":
    num_workers = 2
    workers = [
        GeneralModel.options(
            num_gpus=0.33, num_cpus=2,
            name=f"{cfg.NAMEACTOR}_{i}", namespace=cfg.NAMESPACE
        ).remote(cfg.DEVICE, cfg.LANGUAGE)
        for i in range(num_workers)
    ]
    while True:
        time.sleep(5)

Thanks for your help! But then, when calling a method remotely from another cluster, I will not be able to balance between these workers. I understand correctly? Can anything be done about it?

Can you tell me a bit more about what this exactly means?

I raised 1 main node (head), to which I remotely connect from another cluster in order to access workers.
But one worker executes queries sequentially. I would like to parallelize.
I would like requests to workers to be balanced (as I understood from the documentation, the head node should be able to do this …)

self.cli1 = ray.init(address="ray://1.2.3.4:10001", namespace="models")
with self.cli1:
     self.models = ray.get_actor("general")    # not "general_{i}"

self.models.infer(...)

I don’t understand how I can run several identical workers and communicate with them by the same name (for example, as done in celery).

“communicate with them by the same name”

This is not allowed. In Ray, the name of the actor must be unique.

One potential workaround is to have an actor that has handles to others

In [3]: @ray.remote
   ...: class B:
   ...:     pass
   ...: 

In [4]: @ray.remote
   ...: class A:
   ...:     def __init__(self):
   ...:         self.bs = [B.remote() for _ in range(10)]
   ...:     def get(self):
   ...:         return self.bs

a = A.options(name="parent").remote()

...

a = ray.get_actor(name="parent")
bs = ray.get(a.get.remote())

But I think using ray serve could be a good alternative Getting Started — Ray 3.0.0.dev0. cc @simon-mo is there a way to get a handle to a list of replicas using ray client for ray serve?

@simon-mo is there a way to get a handle to a list of replicas using ray client for ray serve?

There isn’t a public API for it right now. Additionally, Serve doesn’t work quite well with Ray client. Have you look into ActorPool for perform the load balancing? Utility Classes — Ray 2.1.0