tune.Tuner trials not using specified resources with rllib

I am currently attempting to train hyperparameters using RLlib with TD3 and a custom environment. However on one of the two systems I am running my code on, the resources per trial are completely different from the other system and not utilizing the number of resources I specified. I obtain a few variables (cpus_per_worker, gpus_per_worker and max_concurrent) from another function (the values themselves check out). The first two variables are the number of resources I want each trial to use. However when I pass these values to the config (see code below) I get some unexpected behaviour:
On my first system (16 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/8 and max_concurrent = 8, I get a perfect utilization of 2 CPUs and 0.125 GPUs per trial, with 8 concurrent trials running. However on my second system (20 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/9 and max_concurrent=9, I get a total utilization of 20/20 CPUs and 0.444444/1 GPUs with only 4 concurrent trials running. Is there something I am not understanding correctly with the resource configuration or is this unintended behaviour by the tuner?

config = (
    TD3Config()
    .rollouts(num_rollout_workers=cpus_per_worker)
    .resources(num_cpus_per_worker=cpus_per_worker, num_gpus=gpus_per_worker)
    .environment(
        env="CustomRewardEnv",
        env_config={
            "id": "Pendulum-v1",
            "reward_builder": build_reward_fn_pendulum,
            **TUNE_SEARCH_SPACE
        }
    )
    .evaluation(
        evaluation_interval=1,
        evaluation_duration=10,
        # evaluation_num_workers=1,
        evaluation_parallel_to_training=False,
        evaluation_config=TD3Config.overrides(
            env_config={
                "reward_builder": build_reward_fn_pendulum,
                "weight0": 0.5,
                "weight1": 0.25,
                "weight2": 0.25
            }
        )
    )
    .callbacks(OriginalRewardCallback)
    .framework("torch")
)

tuner = tune.Tuner(
    "TD3",
    tune_config=tune.TuneConfig(
        mode="max",
        num_samples=TUNE_NUM_SAMPLES,
        search_alg=alg,
        max_concurrent_trials=max_concurrent
    ),
    param_space=config.to_dict(),
    run_config=train.RunConfig(
        stop={"training_iteration": 20},
    )
)

Versions:
python 3.10
ray 2.8.0
torch 2.1.0

I appreciate any help.

Any tips, I experience the same problem

Hi @Theodoros_Panagiotak and welcome to the community. TD3 is actually not anymore supported in the new RLlib stack and the newest Ray version. Would an SAC algorithm be an alternative for you?

Hey @Lars_Simon_Zehnder, thanks for the reply! my problem is with the tunner not TD3. In particular, when I set custom resources for the environment runners. Those resources aren’t utilized instead other resources are. I can run without the tuner but I would like to do some hyperparameter tuning

@Theodoros_Panagiotak yes, I can read this, but I also see that you are running on a very old version of Ray (2.8.0 vs 2.43.0).

Its hard to tell what is going on in the second case without being on that system. I would check if the num_cpus and num_gpus are indeed 2 and 0.111. Furthermore, you might want to check via the ray list placement-groups and ray list actors --detail what is going on with the placement groups and bundles therein.

Hey I am running ray 2.40. The post above isn’t by me :laughing:. In any case, I am using kuberay. In my configuration I have set the gpu workerpool with 1gpu, 4cpus and can scale up to 2. While the cpu workerpool has 2cpus and 100 of the custom resource and can be scaled to 20 pods.

I normally would want jobs that have the custom resource to be scheduled to the cpu workerpool but some where scheduled to the gpu. This was causing the algorithm to break. In any case thanks for the replies. It is a difficult problem to solve.

@Theodoros_Panagiotak, got ya :smiley: . Out of curiosity: while on Ray 2.40 which algorithm do you run there? There is a custom_resources_per_worker attribute in AlgorithmConfig.resources() which might help in case of custom resources that can be defined. See here for how this works when creating actors in Ray.

Exactly I use PPO and custom_resources_per_env_runner={"game_environments": MAX_GAME_ENVIRONMENTS} , I can confirm that this works without the tune but with tune it doesn’t. Moreover in the kuberay in the initialization params I do add the the init params to say to the node that it has the custom resources.