Rllib runs UNBELIEVABLY slow on windows, even on a basic cartpole environment

henry_lei · November 3, 2021, 1:45pm

I’m trying to train an agent on a custom environment, but training is so slow even with the default cartpole example, with or without a gpu.

My set up is:
Windows 10
Ray 1.3.0
Tensorflow 2.4
python 3.7.10

when I run:
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer

myconfig = {“env”: “CartPole-v0”}
myconfig[“num_workers”] = 12
myconfig[“num_gpus”] = 0
myconfig[“log_level”] = “WARN”
#myconfig[“framework”] = “torch” #also tried with tensorflow 1,2, eager
tune.run(PPOTrainer, config=myconfig)

timers:
learn_throughput: 580.592
learn_time_ms: 8267.423
sample_throughput: 773.311
sample_time_ms: 6207.078
update_time_ms: 6.689

and the time per iteration is about 15 seconds. I’m mostly concerned about the learn_time_ms since, as I understand, that is the sgd update step while the sample_time_ms is for trajectory collection (it seems a little slow too?).

mickelliu · November 16, 2021, 1:53pm

did you change your train_batch_size and sgd_minibatch_size?

Lars_Simon_Zehnder · November 17, 2021, 12:59pm

@henry_lei ,

train_batch_size and sgd_minibatch_size would also be my first guess. A second is your number of workers: the number of workers his quite high and that might cause some overload.

Topic		Replies	Views
PPO Training takes double the time of CPU on GPU RLlib	2	1580	June 4, 2022
RLlib slows down when gpu available but not used RLlib	0	355	April 7, 2021
Registering Custom Environment for `CartPole-v1` with RLlib and Running via Command Line RLlib	8	1850	April 14, 2023
A little help for a novice RLlib	1	431	October 26, 2022
Example of A3C only use CPU for trainer RLlib	10	852	July 23, 2021

Rllib runs UNBELIEVABLY slow on windows, even on a basic cartpole environment

Related topics