Example of ExternalEnv used to implement a REST policy server

Hello everyone,

I am looking for an example on how ExternalEnv can be used to implement a REST policy server (Note: my REST client is outside Python). Also hints how a REST policy server can be combined with ExternalEnv are welcome.

Hey @klausk55 , did you take a look at our ExternalEnv examples here?

Our Policy Server (ray/policy_server_input.py at master · ray-project/ray · GitHub) is already a REST server, accepting data from a client (e.g. ray/policy_client.py at master · ray-project/ray · GitHub, but you can write your own) and serving policy/action requests.
Your client would only have to connect and speak our RLlib protocol, which is quite simple and detailed in the above examples.

Hey @sven1977, yes I did and this is also how I solved it yet :+1:
I slightly modified the REST policy server API/class and implemented a HTTP client in C# similar to PolicyClient. Also changed the config “input” to a callable that returns my REST policy server resp. InputReader.
But what I am still confused by is the following: In the documentation of ExternalEnv class is mentioned that one can use it

by serving HTTP requests in the run loop.

Do it like in the examples or in my modification, I do not need the run() (it is paused/passed). Is the documentation imprecise or does it mean that there is an alternative way using ExternalEnv as “HTTP requests handler”?

That’s correct, you wouldn’t need the ExternalEnv at all (like it’s done in the CartPole or Unity client/serving examples, where we simply connect a client to the server and - in the client - loop through a gym Env and send data and maybe action-requests to the server).

However, there is a RolloutWorker (with an ExternalEnv that overrides the run method simply with sleep(99999)) created automatically inside the PolicyServerInput, but then only really used in case the policy client uses inference_mode=remote.

Similarly, the PolicyClient auto generates a RolloutWorker (using the same auto-wrapped ExternalEnv with the sleep(99999) inside the run method as above), but only if inference_mode=local.

I agree, it’s a little confusing due to these auto-wrappings happening under the hood. I think the original idea was to separate the external env API from the server/client classes, which are more like examples on how one can use the external env API.