Ray Tune will not run as much in parallel as it could. I have a Ubuntu Machine with 24 cores (information from lscpu
command) and 4 GPUs. I have set gpus_per_trial=1/6
because I can easily run 6 scripts on any GPU in parallel (probably even 10 or 12).
The key bits from my script:
...
core_per_trial = multiprocessing.cpu_count() / len(os.environ['CUDA_VISIBLE_DEVICES'].split(",")) * gpus_per_trial
analysis = tune.run(
tune.with_parameters(trial, base_config=copy.deepcopy(base_config)),
resources_per_trial={"cpu": core_per_trial, "gpu": gpus_per_trial},
...
)
...
This leads to about 1-2 scripts being executed in parallel per GPU at a utilisation of < 10% each. But it should execute 6 processes * 4 GPUs = 24.
What am I doing wrong?