Yes, but I don’t think that is enough. If I understand how it works, it is a schedule strategy only, and will not guarantee communication to all nodes.
For examples, I have N nodes.
If I do something like
object_ids = [test_remote.remote() for i in range(20)],
it MIGHT reach the first 20 nodes. Chances are, many of the tasks will be executed on the same node.
Another issues is, Ray is really dealing with virtual cores, and not “machines” per se. What I end up doing is what I had mentioned in my original post, you create many task hoping to that is enough to to reach every single machines. If the number is too low, some machines might never receive the message. If the number is too high, it is wasteful of resources.
What I’m looking for is, a way run a task only once per machine (not per core), only a subset of machines available on the network. This would be trivial with Kafka, since you can subscribe to a command topic and respond when called, but I don’t think it is a feature that is currently supported within Ray.