Ray Tune not running enouch processes

bayesisawesome · November 24, 2022, 4:41pm

Ray Tune will not run as much in parallel as it could. I have a Ubuntu Machine with 24 cores (information from lscpu command) and 4 GPUs. I have set gpus_per_trial=1/6 because I can easily run 6 scripts on any GPU in parallel (probably even 10 or 12).
The key bits from my script:

...
core_per_trial = multiprocessing.cpu_count() / len(os.environ['CUDA_VISIBLE_DEVICES'].split(",")) * gpus_per_trial
analysis = tune.run(
    tune.with_parameters(trial, base_config=copy.deepcopy(base_config)),
    resources_per_trial={"cpu": core_per_trial, "gpu": gpus_per_trial},
    ...
    )
...

This leads to about 1-2 scripts being executed in parallel per GPU at a utilisation of < 10% each. But it should execute 6 processes * 4 GPUs = 24.

What am I doing wrong?

Yard1 · November 29, 2022, 6:21pm

Can you show the entire tune.run call? Can you also show the output from a ray status CLI command called when your Tune experiment is on going (about 30-60 seconds in)?

Topic		Replies	Views
Training trials in parallel on multi-gpu machine Ray Tune	8	1714	August 23, 2021
What is the best configuration for 1 GPU and 1 CPU? Ray Tune	3	1062	December 22, 2022
Parallelly running experiments with Ray Tune on a single Machine Ray Tune	8	133	March 6, 2025
Multiple trials on each GPU Ray Tune	1	487	February 19, 2021
Trials placed on the same GPU on a 2 GPU machine despite "num_gpus": 1 Ray Tune	4	795	April 13, 2021

Ray Tune not running enouch processes

Related topics