Configuring Ray RLib to only use the driver fails for PPO

I just probably stumbled upon a small bug in the ray/rllib/agents/ppo/ppo.py file in the validate_config function. It is about this code snippet:

num_workers = config["num_workers"] or 1
    calculated_min_rollout_size = \
        num_workers * config["num_envs_per_worker"] * \
        config["rollout_fragment_length"]
    if config["train_batch_size"] > 0 and \
            config["train_batch_size"] % calculated_min_rollout_size != 0:

I want to configure Ray/Tune to use only the driver for training, rollouts and evaluation. Accordingly, I have made the following configuration:

("num_workers", 0),  
("num_envs_per_worker", 0),  
("num_cpus_per_worker", 0), 
("num_gpus_per_worker", 0),
("custom_resources_per_worker", {}),
("evaluation_num_workers", 0),  
("num_cpus_for_driver", 1),  
("create_env_on_driver", True)

But this throws an error because the calculated_min_rollout_size is always zero:

File "/home/lukas/anaconda3/lib/python3.8/site-packages/ray/rllib/agents/ppo/ppo.py", line 138, in validate_config
    config["train_batch_size"] % calculated_min_rollout_size != 0:
ZeroDivisionError: integer division or modulo by zero

Hi @LukasNothhelfer,

Setting num_workers to zero is sufficient to use only the driver. You do not need to zero out the rest of the num_*_worker keys. In fact I think you might mess up the driver config too because when num_workers is 0 it treats the driver as a worker.

1 Like

@mannyv I had to set num_envs_per_worker to 1
This configuration works now:

("num_workers", 0),  
("num_envs_per_worker", 1),  
("num_cpus_per_worker", 0),  
("num_gpus_per_worker", 0),
("custom_resources_per_worker", {}),
("evaluation_num_workers", 0), 
("num_cpus_for_driver", 1),  
("create_env_on_driver", True),  

The semantic of the validation function and configuration has confused me a bit.

@LukasNothhelfer,
Me too; me too.