ExternalEnv vs. External Application Clients?

Hi all! I’m pretty new to RLlib, and programming in general.

The documentation: https://docs.ray.io/en/master/rllib-env.html#external-agents-and-applications and Unity3D example here: https://medium.com/distributed-computing-with-ray/reinforcement-learning-with-rllib-in-the-unity-game-engine-1a98080a7c0d are both quite helpful, but I have a few lingering questions:

  • Can both the ExternalEnv and Server-Client approaches utilize all the features (e.g. RL algorithms) of RLlib / Ray?
  • Following up the above, is there a pros/cons list comparing the ExternalEnv Server-Client approaches?

In my current thesis work, I want to leverage RLlib with an external environment in C#. I’m wrestling with which approach to take:

  1. ExternalEnv approach, where I will need to create a new wrapper like ml-agents that can interact with my environment like ml-agents enables interaction with Unity. I will have to read the ml-agents documentation and code more carefully to get this working.
  2. Server-Client approach, which might need a simpler (or no) wrapper, but it looks like learning is it batches, and perhaps slower?

Obviously I haven’t given much details on my external environment but, just looking for general best practices.



Hey @Chandler_White , actually, our external env API is receiving more and more attention these days and we would love to have more support and feedback for this feature coming from the community. We currently only have a python client built-in, but would like to expand this to other languages as well (C#, C++, etc…), so feel free to PR a possible C# solution.

To answer your question on algo support: Yes, all algorithms support the server/client setup, however in the inference_mode=local setting (set on the client side), there is basically no “on-policy guarantee” b/c the weights for the behavior policy are not always updated on time (only sporadically every n seconds). In inference_mode=remote, this should not be an issue, though and both on-policy and off-policy algos should work.

Thanks @sven1977 !

Could you please explain what the difference is between the “local” and “remote” inference_mode(s)? I can’t seem to find a clear answer in the ray.io documentation :sweat_smile:.


In local mode the remote env will have a copy of the model and pass the observations to it to get the actions. The remote env has a parameter that it uses to periodically request new model weights from rllib to update the model. It us possible that you will sometimes be computing actions with an outdated model. This could slow or break learning with on policy algorithms.

In remote mode the remote env will send the observations to rllib to compute the actions. This mode will always use the current policy but you may incur a large communication overheads with all those rest requests.

1 Like