I’m using Ray 2.8.1 (I’ve also tried the latest 2.38.0), and it seems like RLlib’s performance is worse when adding rollout workers (env_runners).
The above results were trained with 2.8.1 by the following script:
from ray import tune, train
from ray.rllib.algorithms.dqn import DQNConfig
config: DQNConfig = (
DQNConfig()
.environment("CartPole-v1")
.rollouts(num_rollout_workers=0, num_envs_per_worker=8)
.resources(num_gpus=1)
)
tuner = tune.Tuner(
"DQN",
param_space=config.to_dict(),
run_config=train.RunConfig(
"CartPole_Env_Parallel",
checkpoint_config=train.CheckpointConfig(checkpoint_at_end=True),
stop={
"episode_reward_mean": 300
}
)
)
results = tuner.fit()
print(results.get_best_result())
- The black line was trained with
num_rollout_workers=0, num_envs_per_worker=8
, it reached 300 mean reward by 3minutes. - The blue line was trained with
num_rollout_workers=8, num_envs_per_worker=1
, it reached the same mean reward by 9 minutes.
Is this expected?