Rllib runs UNBELIEVABLY slow on windows, even on a basic cartpole environment

I’m trying to train an agent on a custom environment, but training is so slow even with the default cartpole example, with or without a gpu.

My set up is:
Windows 10
Ray 1.3.0
Tensorflow 2.4
python 3.7.10

when I run:
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer

myconfig = {“env”: “CartPole-v0”}
myconfig[“num_workers”] = 12
myconfig[“num_gpus”] = 0
myconfig[“log_level”] = “WARN”
#myconfig[“framework”] = “torch” #also tried with tensorflow 1,2, eager
tune.run(PPOTrainer, config=myconfig)

timers:
learn_throughput: 580.592
learn_time_ms: 8267.423
sample_throughput: 773.311
sample_time_ms: 6207.078
update_time_ms: 6.689

and the time per iteration is about 15 seconds. I’m mostly concerned about the learn_time_ms since, as I understand, that is the sgd update step while the sample_time_ms is for trajectory collection (it seems a little slow too?).

did you change your train_batch_size and sgd_minibatch_size?

1 Like

@henry_lei ,

train_batch_size and sgd_minibatch_size would also be my first guess. A second is your number of workers: the number of workers his quite high and that might cause some overload.