I have a use case where I need to listen for calls from a live system. When an episode begins (first call to my API comes in) I need to open a session with a policy client, return an action as response, but I need to hold the session open until another call comes in for the episode, repeat until episode end. In reading about BaseEnv it seems like that is built around the idea of “polling” the environment for the next thing. Instead I need to wait until the system(environment) calls me. What is the best way to handle this with RLLIB? Do I need to create my own environment? Do I even need an environment?
Hey @Jason_Weinberg, thanks for the question. It sounds like you would want to use RLlib’s client/server tools to set this up:
“Client”:
You “live system” (your environment?), which requires actions from time to time, given some observations/states. Uses RLlib’s PolicyClient class/API to communicate with a server.
“Server”:
An RLlib process with a Policy that can either just be used for action serving, but also accept batches from the client for further learning.
There is also an option to compute actions directly on the client side (using RLlib and a local, client-side policy instance that’s being queried), and only using the server for sending train-batches and performing policy updates.
Take a look at these examples here for more details. There is a cartpole exapmle and a Unity3D example in this folder here (each using 1 client + 1 server script):
ray.rllib.examples.serving.cartpole_[client|server].py
OR
ray.rllib.examples.serving.unity3d_[client|dummy_client (not actually using Unity3D)|server].py