How do I enable remote inference for PPO?

Dylan_Kerler · July 26, 2021, 8:55pm

I want to step n envs in parallel on 1 node, collect n observations, send n observations off to another node with 12 GPUs that hold the policy weights, compute n actions on those 12 GPUs, then send n actions back to the original node. Then repeat from step one again.

Is this possible with rllib?

sven1977 · July 28, 2021, 3:14pm

Hey @Dylan_Kerler , so far, none of the RLlib algos does this kind of “remote” inference (calculating actions on a node different from the one holding the envs). I’m guessing you’d like to have something that resembles DeepMind’s “SEED” architecture?

But there are some settings, which could help you achieve this:

Assuming you have your 12 GPU machine (+ some small number of CPUs) and one or more CPU-only nodes (on which you would like to run your envs).
Set num_workers=0, you’ll only have a single learner that also does the sampling.
Set remote_worker_envs=True. This will make each individual env a ray actor. All n envs are then stepped in parallel.
You would then also have to override the default_resource_request method in your Trainer to make sure the env CPUs are not required to be on the same node as the GPUs.

You can look at how IMPALA does this in ray.rllib.agents.impala.impala.py. Something like this may work:

class OverrideDefaultResourceRequest:
    @classmethod
    @override(Trainable)
    def default_resource_request(cls, config):
        cf = dict(cls._default_config, **config)

        # Return PlacementGroupFactory containing all needed resources
        # (already properly defined as device bundles).
        return PlacementGroupFactory(
            bundles=[{
                "CPU": cf["num_cpus_for_driver"],
                "GPU": cf["num_gpus"]
            }, {
                # Different bundle (node) for your n "remote" envs (set remote_worker_envs=True).
                "CPU": cf["num_envs_per_worker"]
            }],
            strategy=config.get("placement_strategy", "PACK"))

MyTrainer = add_mixins(PPOTrainer, [OverrideDefaultResourceRequest])

I haven’t tried this, but I can spend some time on a simple example script that would demonstrate such a setup. …

sven1977 · July 28, 2021, 5:00pm

Here is an example script that demonstrates how to do this:

github.com/ray-project/ray

[RLlib] Add example script for how to do n remote envs with inference happening on "main" (possibly GPU) node.

ray-project:master ← sven1977:remote_envs_with_inference_done_on_main_node

opened 04:59PM - 28 Jul 21 UTC

sven1977

+192 -12

Add example script for how to do n remote envs with inference happening on "main…" (possibly GPU) node. Also see this discussion here: https://discuss.ray.io/t/how-do-i-enable-remote-inference-for-ppo/3025/2 ## Why are these changes needed? ## Related issue number ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

sven1977 · August 19, 2021, 10:15am

This has been merged into master. Sorry for the delay.

Topic		Replies	Views
Using pre-trained PPO for Inference RLlib	1	56	January 7, 2025
Implementing something similar to SEED RL architecture RLlib	5	542	September 25, 2021
Collect samples on a remote server train on local RLlib	5	691	April 27, 2021
Issue with multiple environments training one PPO policy RLlib	0	21	May 25, 2025
Help with ppo config in multiagent env with complex observations Configure Algorithm, Training, Evaluation, Scaling	0	39	April 11, 2025

How do I enable remote inference for PPO?

Related topics