Ray Tune not running enouch processes

Ray Tune will not run as much in parallel as it could. I have a Ubuntu Machine with 24 cores (information from lscpu command) and 4 GPUs. I have set gpus_per_trial=1/6 because I can easily run 6 scripts on any GPU in parallel (probably even 10 or 12).
The key bits from my script:

...
core_per_trial = multiprocessing.cpu_count() / len(os.environ['CUDA_VISIBLE_DEVICES'].split(",")) * gpus_per_trial
analysis = tune.run(
    tune.with_parameters(trial, base_config=copy.deepcopy(base_config)),
    resources_per_trial={"cpu": core_per_trial, "gpu": gpus_per_trial},
    ...
    )
...

This leads to about 1-2 scripts being executed in parallel per GPU at a utilisation of < 10% each. But it should execute 6 processes * 4 GPUs = 24.

What am I doing wrong?

Can you show the entire tune.run call? Can you also show the output from a ray status CLI command called when your Tune experiment is on going (about 30-60 seconds in)?