Rebalancing tasks dynamically

I have mixed CPU types, I have simple compute job that submit to the cluster.
I have totally 64 CPU resources. When it starts it allocates or rather queues the job in a pool. But when a machine with certain CPU type finishes early, the job is not allocated to the idling CPUs, and it is waiting for other resources to finish. It rather does not do anything automatic, it just splits the job and posts.

async_results = [pool.apply_async(launch_compute, args=(data,)) for data in data_array]
results = [ar.get() for ar in async_results]

Is there any configuration that makes the cluster more efficient and that it dynamically finds free CPUs that already finished their job and feed the pending ones to them?

Running on a bare machine works much faster, if I distribute manually using simple socket.

whats your job and cluster configs?

5 Machines, and the job is simply a Hello World type of thing and some calculation to keep the CPUs busy. I was trying to test out how this load balanced across CPUs. Just really a getting started example. The array has about 500 values, that means it is creating 500 independent jobs. Is there any differentiating configuration for concurrency and parallel executions?

The autoscaler and scheduler will automatically orchestrate that for you; you just have to define per task/actor how many CPUs/memory you want to allocate it and do the same at the Ray Cluster level.

You can then tune accordingly - buffering and queueing is provided out of box.