Placement group with iterator to spread function to all CPU's in the cluster

I am trying to distribute a a python function across nodes so that each CPU in the cluster is running the function in parallel. The function is being fed by a Ray iterator. I want to use a STRICT_SPREAD approach so that all CPU’s are utilized but I don’t know how many bundles to create or how to specify the placement_group_bundle_index=i if I need it.

[UPDATED with more context]
In my dev environment I have 2 nodes, one node with 8 cores and the other with 12. The function processes a range of files and I am passing to the function the range to process. In my small sample data there are 52 ranges of in production I won’t be able to know ahead of time the number of ranges necessary to process the data. Basically I am trying to distribute so that each CPU is processing a range until all the ranges are done. So I would expect all 20 cores to be in use until the 52 ranges are complete.

If i set the number of shards to 20 (the number of cpu’s in the cluster) the host node will run all 12 of its CPU’s but the second node gets none.

Right now all the shards are being processed just on the host node. Any help would be appreciated.

it = (ray.util.iter.from_items(file_processing_ranges, repeat=False))
result_ids = [start_remote_process.remote(shard) for shard in it.shards()]

@sangcho could you offer help on this one?

I have resolved the issue. I was running the program with ray.init(address="auto" from a node other than the host and the program would only run on the host. When I run the program from the host it uses all the nodes in the cluster.

1 Like

Thank you for asking the question @hub-il ! Regarding this topic, if you think there are ways we should improve docs.ray.io to be more clear (e.g. where to run the code), please let us know!

@hub-il so, were you able to use the placement group in this case, or do you still need an answer for this question?

I am trying to distribute a a python function across nodes so that each CPU in the cluster is running the function in parallel. The function is being fed by a Ray iterator. I want to use a STRICT_SPREAD approach so that all CPU’s are utilized but I don’t know how many bundles to create or how to specify the placement_group_bundle_index=i if I need it.

As an alternative to the placement group I set the number of shards to the number of cores in the cluster and ray seems to have done a pretty good job of distributing the work across the cores in the cluster. If you think that placement groups would be a better option then I’d love to know.

I understood that it was best practice to run it from the host but I had read that it could be launched from any node. I’m not sure if I read that in the official docs or not. I’ll see if I can find where i read that and make a recommendation if it was in the docs.