Formula for RlLib resource requirements

PhilippWillms · August 23, 2023, 3:33pm

I still don’t get the gist how Rllib determines resource requirements.

For tune I figured it out to the following formular:

num_samples * ( (num_workers * num_cpus_per_worker) + (num_workers * num_gpus_per_worker))

Now telling you that scenario for Rllib.

Feasible:
(Ok, with multiple raylet OOM warning messages, but it terminates with the final result message.)

if __name__ == "__main__":
    ray.init(num_cpus=12, num_gpus=1)
    config = (
        ppo.PPOConfig()
        .environment("CartPole-v1")
        .rollouts(num_rollout_workers=2)
        .resources(num_cpus_per_worker=6, num_gpus_per_worker=0.5)
        .framework("tf2", eager_tracing=True)
    )
    algo = config.build()
    algo.train()
    print("One iteration done")

Infeasible:
(Turning into endless loop)

if __name__ == "__main__":
    ray.init(num_cpus=12, num_gpus=1)
    config = (
        ppo.PPOConfig()
        .environment("CartPole-v1")
        .rollouts(num_rollout_workers=3)
        .resources(num_cpus_per_worker=4, num_gpus_per_worker=0.3)
        .framework("tf2", eager_tracing=True)
    )
    algo = config.build()
    algo.train()
    print("One iteration done")

What is the difference between (34,30.3) = (12,0.9) and (26,20.5)=(12,1) configuration?

PhilippWillms · April 2, 2024, 10:30pm

push Any update on that topic? Maybe @kai can comment, you also helped me with the tune resource configuration.

Topic		Replies	Views
Confused about RLlib learners and resources config in new API stack Configure Algorithm, Training, Evaluation, Scaling	0	76	August 14, 2024
[Tune/Rllib] Implementing reset_config for Rllib	1	419	March 31, 2024
How to set #of cpu and gpu per trial? RLlib	1	985	November 6, 2021
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1183	February 9, 2023
Utilization of resources by RLlib RLlib	2	355	November 7, 2023

Formula for RlLib resource requirements

Related topics