Ray distributed memory parallelism


When it comes to creating a ray cluster, is there a way to limit the number of workers per node? My remote function generates large amounts of data and I want to avoid having many threads on the same node.

Hi @wrios - great question!

In general, each ray node will be started by ray start or ray.init(), and you could pass “num_cpus” to specify the logical number of CPUs that this node has. This will be equivalent to the maximal number of workers processes that Ray starts. (Do note that for each ray worker process, there might be multiple threads being used)

And, how are you creating the cluster? I could probbaly provide more specific help if I know that.

Thank you for your interest in helping. To set up the cluster I am using a similar script as depicted in this example. Because a remote function can generate large amounts of data before it returns something, as occur in wave simulation where the history might need to be saved, it can fill up the node memory very easily if I have several ray workers in the same node. I was just wondering if there is a way to enforce ray workers to be distributed across nodes so as to avoid this to happen.