tune.Tuner trials not using specified resources with rllib

I am currently attempting to train hyperparameters using RLlib with TD3 and a custom environment. However on one of the two systems I am running my code on, the resources per trial are completely different from the other system and not utilizing the number of resources I specified. I obtain a few variables (cpus_per_worker, gpus_per_worker and max_concurrent) from another function (the values themselves check out). The first two variables are the number of resources I want each trial to use. However when I pass these values to the config (see code below) I get some unexpected behaviour:
On my first system (16 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/8 and max_concurrent = 8, I get a perfect utilization of 2 CPUs and 0.125 GPUs per trial, with 8 concurrent trials running. However on my second system (20 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/9 and max_concurrent=9, I get a total utilization of 20/20 CPUs and 0.444444/1 GPUs with only 4 concurrent trials running. Is there something I am not understanding correctly with the resource configuration or is this unintended behaviour by the tuner?

config = (
    TD3Config()
    .rollouts(num_rollout_workers=cpus_per_worker)
    .resources(num_cpus_per_worker=cpus_per_worker, num_gpus=gpus_per_worker)
    .environment(
        env="CustomRewardEnv",
        env_config={
            "id": "Pendulum-v1",
            "reward_builder": build_reward_fn_pendulum,
            **TUNE_SEARCH_SPACE
        }
    )
    .evaluation(
        evaluation_interval=1,
        evaluation_duration=10,
        # evaluation_num_workers=1,
        evaluation_parallel_to_training=False,
        evaluation_config=TD3Config.overrides(
            env_config={
                "reward_builder": build_reward_fn_pendulum,
                "weight0": 0.5,
                "weight1": 0.25,
                "weight2": 0.25
            }
        )
    )
    .callbacks(OriginalRewardCallback)
    .framework("torch")
)

tuner = tune.Tuner(
    "TD3",
    tune_config=tune.TuneConfig(
        mode="max",
        num_samples=TUNE_NUM_SAMPLES,
        search_alg=alg,
        max_concurrent_trials=max_concurrent
    ),
    param_space=config.to_dict(),
    run_config=train.RunConfig(
        stop={"training_iteration": 20},
    )
)

Versions:
python 3.10
ray 2.8.0
torch 2.1.0

I appreciate any help.