I am trying to run a trial in RayTune for RLlib (Hyperparameter Tuning), for which I want to test how the number of cores I throw at the problem affects calculation times. I tried ray.init(num_cpus=foo), but it just utilises all cores on the machine nonetheless, regardless of the number I put in there (as seen in Taskmanager).
I have googled for a long time and nothing I tried worked so far - CPU usage is always 100% for all cores. I can’t try the tune.with_resources()
, as I am working with RLlib and need to specify the trainable as a string name of the algorithm (e.g. “PPO”). Can someone help me overcome this?
Hi @MathiasSteilen , if this is for RLlib, please visit this page to read about scaling up your experiments. You need to increase of num_rollout_workers
(previously known as num_workers
) to increase resources for each RL experiment.
If you want to increase the number of trials concurrently running at the same time, tune will automatically handle the scheduling depending on how many resources are available on your cluster (set by cluster settings and ray.init()
) and how many resources each trial needs. (more info here)
Setting ray.init(num_cpus=foo) sets up the number of logical cpus used for the entire cluster. (more info here)