Hey @Dylan_Kerler , so far, none of the RLlib algos does this kind of “remote” inference (calculating actions on a node different from the one holding the envs). I’m guessing you’d like to have something that resembles DeepMind’s “SEED” architecture?
But there are some settings, which could help you achieve this:
- Assuming you have your 12 GPU machine (+ some small number of CPUs) and one or more CPU-only nodes (on which you would like to run your envs).
num_workers=0, you’ll only have a single learner that also does the sampling.
remote_worker_envs=True. This will make each individual env a ray actor. All n envs are then stepped in parallel.
- You would then also have to override the
default_resource_request method in your Trainer to make sure the env CPUs are not required to be on the same node as the GPUs.
You can look at how IMPALA does this in
ray.rllib.agents.impala.impala.py. Something like this may work:
def default_resource_request(cls, config):
cf = dict(cls._default_config, **config)
# Return PlacementGroupFactory containing all needed resources
# (already properly defined as device bundles).
# Different bundle (node) for your n "remote" envs (set remote_worker_envs=True).
MyTrainer = add_mixins(PPOTrainer, [OverrideDefaultResourceRequest])
I haven’t tried this, but I can spend some time on a simple example script that would demonstrate such a setup. …