How do I enable remote inference for PPO?

I want to step n envs in parallel on 1 node, collect n observations, send n observations off to another node with 12 GPUs that hold the policy weights, compute n actions on those 12 GPUs, then send n actions back to the original node. Then repeat from step one again.

Is this possible with rllib?

1 Like

Hey @Dylan_Kerler , so far, none of the RLlib algos does this kind of “remote” inference (calculating actions on a node different from the one holding the envs). I’m guessing you’d like to have something that resembles DeepMind’s “SEED” architecture?

But there are some settings, which could help you achieve this:

  • Assuming you have your 12 GPU machine (+ some small number of CPUs) and one or more CPU-only nodes (on which you would like to run your envs).
  • Set num_workers=0, you’ll only have a single learner that also does the sampling.
  • Set remote_worker_envs=True. This will make each individual env a ray actor. All n envs are then stepped in parallel.
  • You would then also have to override the default_resource_request method in your Trainer to make sure the env CPUs are not required to be on the same node as the GPUs.

You can look at how IMPALA does this in ray.rllib.agents.impala.impala.py. Something like this may work:

class OverrideDefaultResourceRequest:
    @classmethod
    @override(Trainable)
    def default_resource_request(cls, config):
        cf = dict(cls._default_config, **config)

        # Return PlacementGroupFactory containing all needed resources
        # (already properly defined as device bundles).
        return PlacementGroupFactory(
            bundles=[{
                "CPU": cf["num_cpus_for_driver"],
                "GPU": cf["num_gpus"]
            }, {
                # Different bundle (node) for your n "remote" envs (set remote_worker_envs=True).
                "CPU": cf["num_envs_per_worker"]
            }],
            strategy=config.get("placement_strategy", "PACK"))

MyTrainer = add_mixins(PPOTrainer, [OverrideDefaultResourceRequest])

I haven’t tried this, but I can spend some time on a simple example script that would demonstrate such a setup. …

Here is an example script that demonstrates how to do this:

1 Like

This has been merged into master. Sorry for the delay.