Say I have M nodes and function that I want to run N > 0 times. What’s the best approach for running the function in parallel across nodes but never within nodes?
I can’t decorate the function with a cpu count, because I don’t know how big the nodes will be, so placement groups with STRICT_SPREAD seem like a good strategy here. I did some prototyping, and it seems like the length of the bundles in a placement group can’t be greater than the number of nodes, or I’ll get an autoscaler error.
My hacky approach now is to loop through chunks of bundles, M (= number of nodes) items at a time, create a placement group with the chunk of bundles, and run the tasks, but there has to be something easier.
Another idea stemming from the CPU count is to start all your nodes with a custom resource (e.g. {"custom_resource_name": 1}) and then have the function request that same resource.
The main difference from the provided approach is that each node will start processing a new task upon completion of its current task, as opposed to waiting for all nodes to finish.