How can I set remote_workers from different machines or clusters

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Rllib use local_worker for training and remote_workers for sampling. By default, local_worker and remote_workers are running on the same machine. But a single machine has limited Cpus for sampling. It is very hard to handle complicated RL tasks.

Is it possible to set remote_workers from another machine or a docker cluster,who is just responsible for sampling and sending the collected data to local worker ? I failed to find any api or settings in docs to enable this kind of sampling mode.

Hi @Yunior_Zhang,

The rollout workers will be a ray remote actor running on some node in the ray cluster. What you could do is connect multiple nodes together using ray in the usual way. But for the node you will run the RLlib code from you can start way with num_cpus=1. Then ray will only be able to schedule the rollout workers on a different node.