tune.Tuner trials not using specified resources with rllib

byTreneib · November 10, 2023, 2:27pm

I am currently attempting to train hyperparameters using RLlib with TD3 and a custom environment. However on one of the two systems I am running my code on, the resources per trial are completely different from the other system and not utilizing the number of resources I specified. I obtain a few variables (cpus_per_worker, gpus_per_worker and max_concurrent) from another function (the values themselves check out). The first two variables are the number of resources I want each trial to use. However when I pass these values to the config (see code below) I get some unexpected behaviour:
On my first system (16 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/8 and max_concurrent = 8, I get a perfect utilization of 2 CPUs and 0.125 GPUs per trial, with 8 concurrent trials running. However on my second system (20 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/9 and max_concurrent=9, I get a total utilization of 20/20 CPUs and 0.444444/1 GPUs with only 4 concurrent trials running. Is there something I am not understanding correctly with the resource configuration or is this unintended behaviour by the tuner?

config = (
    TD3Config()
    .rollouts(num_rollout_workers=cpus_per_worker)
    .resources(num_cpus_per_worker=cpus_per_worker, num_gpus=gpus_per_worker)
    .environment(
        env="CustomRewardEnv",
        env_config={
            "id": "Pendulum-v1",
            "reward_builder": build_reward_fn_pendulum,
            **TUNE_SEARCH_SPACE
        }
    )
    .evaluation(
        evaluation_interval=1,
        evaluation_duration=10,
        # evaluation_num_workers=1,
        evaluation_parallel_to_training=False,
        evaluation_config=TD3Config.overrides(
            env_config={
                "reward_builder": build_reward_fn_pendulum,
                "weight0": 0.5,
                "weight1": 0.25,
                "weight2": 0.25
            }
        )
    )
    .callbacks(OriginalRewardCallback)
    .framework("torch")
)

tuner = tune.Tuner(
    "TD3",
    tune_config=tune.TuneConfig(
        mode="max",
        num_samples=TUNE_NUM_SAMPLES,
        search_alg=alg,
        max_concurrent_trials=max_concurrent
    ),
    param_space=config.to_dict(),
    run_config=train.RunConfig(
        stop={"training_iteration": 20},
    )
)

Versions:
python 3.10
ray 2.8.0
torch 2.1.0

I appreciate any help.

Theodoros_Panagiotak · March 3, 2025, 11:05pm

Any tips, I experience the same problem

Lars_Simon_Zehnder · March 11, 2025, 6:46pm

Hi @Theodoros_Panagiotak and welcome to the community. TD3 is actually not anymore supported in the new RLlib stack and the newest Ray version. Would an SAC algorithm be an alternative for you?

Theodoros_Panagiotak · March 12, 2025, 8:33am

Hey @Lars_Simon_Zehnder, thanks for the reply! my problem is with the tunner not TD3. In particular, when I set custom resources for the environment runners. Those resources aren’t utilized instead other resources are. I can run without the tuner but I would like to do some hyperparameter tuning

Lars_Simon_Zehnder · March 14, 2025, 11:45am

@Theodoros_Panagiotak yes, I can read this, but I also see that you are running on a very old version of Ray (2.8.0 vs 2.43.0).

Its hard to tell what is going on in the second case without being on that system. I would check if the num_cpus and num_gpus are indeed 2 and 0.111. Furthermore, you might want to check via the ray list placement-groups and ray list actors --detail what is going on with the placement groups and bundles therein.

Theodoros_Panagiotak · March 14, 2025, 12:03pm

Hey I am running ray 2.40. The post above isn’t by me . In any case, I am using kuberay. In my configuration I have set the gpu workerpool with 1gpu, 4cpus and can scale up to 2. While the cpu workerpool has 2cpus and 100 of the custom resource and can be scaled to 20 pods.

I normally would want jobs that have the custom resource to be scheduled to the cpu workerpool but some where scheduled to the gpu. This was causing the algorithm to break. In any case thanks for the replies. It is a difficult problem to solve.

Lars_Simon_Zehnder · March 14, 2025, 12:37pm

@Theodoros_Panagiotak, got ya . Out of curiosity: while on Ray 2.40 which algorithm do you run there? There is a custom_resources_per_worker attribute in AlgorithmConfig.resources() which might help in case of custom resources that can be defined. See here for how this works when creating actors in Ray.

Theodoros_Panagiotak · March 14, 2025, 1:12pm

Exactly I use PPO and custom_resources_per_env_runner={"game_environments": MAX_GAME_ENVIRONMENTS} , I can confirm that this works without the tune but with tune it doesn’t. Moreover in the kuberay in the initialization params I do add the the init params to say to the node that it has the custom resources.

Topic		Replies	Views
No trial resources are available for launching the actor RLlib	4	894	November 14, 2022
Best Practices for Optimizing Ray Tune Trials RLlib	2	24	June 19, 2025
Specify trial resources when using search algorithm to tune hyper-parameters RLlib	2	501	September 7, 2022
ray.tune.Tuner resource configuration	2	417	September 5, 2023
Resources not being used Ray Core	4	1297	September 21, 2021

tune.Tuner trials not using specified resources with rllib

Related topics