I am currently attempting to train hyperparameters using RLlib with TD3 and a custom environment. However on one of the two systems I am running my code on, the resources per trial are completely different from the other system and not utilizing the number of resources I specified. I obtain a few variables (cpus_per_worker, gpus_per_worker and max_concurrent) from another function (the values themselves check out). The first two variables are the number of resources I want each trial to use. However when I pass these values to the config (see code below) I get some unexpected behaviour:
On my first system (16 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/8 and max_concurrent = 8, I get a perfect utilization of 2 CPUs and 0.125 GPUs per trial, with 8 concurrent trials running. However on my second system (20 CPUs, 1 GPU) with cpus_per_worker=2, gpus_per_worker=1/9 and max_concurrent=9, I get a total utilization of 20/20 CPUs and 0.444444/1 GPUs with only 4 concurrent trials running. Is there something I am not understanding correctly with the resource configuration or is this unintended behaviour by the tuner?
config = (
TD3Config()
.rollouts(num_rollout_workers=cpus_per_worker)
.resources(num_cpus_per_worker=cpus_per_worker, num_gpus=gpus_per_worker)
.environment(
env="CustomRewardEnv",
env_config={
"id": "Pendulum-v1",
"reward_builder": build_reward_fn_pendulum,
**TUNE_SEARCH_SPACE
}
)
.evaluation(
evaluation_interval=1,
evaluation_duration=10,
# evaluation_num_workers=1,
evaluation_parallel_to_training=False,
evaluation_config=TD3Config.overrides(
env_config={
"reward_builder": build_reward_fn_pendulum,
"weight0": 0.5,
"weight1": 0.25,
"weight2": 0.25
}
)
)
.callbacks(OriginalRewardCallback)
.framework("torch")
)
tuner = tune.Tuner(
"TD3",
tune_config=tune.TuneConfig(
mode="max",
num_samples=TUNE_NUM_SAMPLES,
search_alg=alg,
max_concurrent_trials=max_concurrent
),
param_space=config.to_dict(),
run_config=train.RunConfig(
stop={"training_iteration": 20},
)
)
Versions:
python 3.10
ray 2.8.0
torch 2.1.0
I appreciate any help.