Placement group with iterator to spread function to all CPU's in the cluster

hub-il · June 7, 2022, 7:31pm

I am trying to distribute a a python function across nodes so that each CPU in the cluster is running the function in parallel. The function is being fed by a Ray iterator. I want to use a STRICT_SPREAD approach so that all CPU’s are utilized but I don’t know how many bundles to create or how to specify the placement_group_bundle_index=i if I need it.

[UPDATED with more context]
In my dev environment I have 2 nodes, one node with 8 cores and the other with 12. The function processes a range of files and I am passing to the function the range to process. In my small sample data there are 52 ranges of in production I won’t be able to know ahead of time the number of ranges necessary to process the data. Basically I am trying to distribute so that each CPU is processing a range until all the ranges are done. So I would expect all 20 cores to be in use until the 52 ranges are complete.

If i set the number of shards to 20 (the number of cpu’s in the cluster) the host node will run all 12 of its CPU’s but the second node gets none.

Right now all the shards are being processed just on the host node. Any help would be appreciated.

it = (ray.util.iter.from_items(file_processing_ranges, repeat=False))
result_ids = [start_remote_process.remote(shard) for shard in it.shards()]

yic · June 7, 2022, 7:45pm

@sangcho could you offer help on this one?

hub-il · June 8, 2022, 3:01am

I have resolved the issue. I was running the program with ray.init(address="auto" from a node other than the host and the program would only run on the host. When I run the program from the host it uses all the nodes in the cluster.

zhz · June 8, 2022, 3:57am

Thank you for asking the question @hub-il ! Regarding this topic, if you think there are ways we should improve docs.ray.io to be more clear (e.g. where to run the code), please let us know!

sangcho · June 8, 2022, 2:09pm

@hub-il so, were you able to use the placement group in this case, or do you still need an answer for this question?

I am trying to distribute a a python function across nodes so that each CPU in the cluster is running the function in parallel. The function is being fed by a Ray iterator. I want to use a STRICT_SPREAD approach so that all CPU’s are utilized but I don’t know how many bundles to create or how to specify the placement_group_bundle_index=i if I need it.

hub-il · June 8, 2022, 3:31pm

As an alternative to the placement group I set the number of shards to the number of cores in the cluster and ray seems to have done a pretty good job of distributing the work across the cores in the cluster. If you think that placement groups would be a better option then I’d love to know.

hub-il · June 8, 2022, 3:33pm

I understood that it was best practice to run it from the host but I had read that it could be launched from any node. I’m not sure if I read that in the official docs or not. I’ll see if I can find where i read that and make a recommendation if it was in the docs.

Topic		Replies	Views
Parallelize function across nodes but never within nodes Ray Core	4	361	February 1, 2022
How does ray decide where to run a function? Ray Clusters	1	438	December 23, 2021
Examples not running across multiple nodes in a cluster	2	918	June 23, 2023
How to run a script on all cluster nodes Ray Clusters	2	614	February 15, 2022
How to run a function exactly once on each node? Ray Core	4	2156	May 18, 2021

Placement group with iterator to spread function to all CPU's in the cluster

Related topics