Hi everyone! I’m really new in RLlib and in Ray too. I have a knowleadge in RL but I never use a library for RL before.
I’ll would try to be clear to expose my problem.
I have a simulator wich run in my PC. The simulator works in a secuential process and I can sense the state (it is composed for seven differents variables and in the future maybe more) in a sertain point of the simulation and set actuators (aply an action) in the same point to try change the next state. The interaction with the simulator is completely in Python. So untill here is all ok.
But now is were I’m lost. I would like to the simulator with the RLlib library.
It is correct if a try to connect the simulator with a ExternalEnv configuration? And if this is the way, I like to know, how to use ExternalEnv? I’m not really sure of the configuration of the ExternalEnv…
I understand that the action can be selected with get_action method and I must to register the reward with log_returns after that. Also I understand the necesity of start_episode method, but I don’t know how to configur the run method in ExternalEnv and how to register the environment to be used.
I hope that this not be a trivial question and that you can help me, thaks!
On a high level, ExternalEnv allows you to run your Env outside of RLlib remotely.
And use RLlib as a policy server.
If you need to compute an action and use it with your remote simulator, you just query RLlib with the OBS. and then you can log the reward returned for that OBS and action.
one thing to realize is that RLlib only needs those OBS and reward to optimize the policy, so you don’t need to register your environment with the RLlib server.
Have you seen the Cartpole server and client examples?
Hello @gjoliver , really thank you for your response.
I read the file that you shared and in adition the ExternalEnv configuration in Ray docs.
I used the ExternalEnv file as a template and added the functions that run the external simulator in my computer and produce a loop that executed continuously individual episodes wich contain:
1. Call self.start_episode(episode_id)
2. Call self.get_action(episode_id, obs)
3. Call self.log_returns(episode_id, reward)
4. Call self.end_episode(episode_id, obs)
5. Wait if nothing to do. #This point I'm not sure what means and for that reason I don't know if it is implemented in my code.
I’m not sure if my implementation is correct, because I don’t know how to use now this environment in a Tun or Trainer configuration with ray.
There are examples wich configurate the execution of an ExternalEnv with Tune or Trainer? Or someone know how to do that?
The cartpole_server.py example shows you how to run this end-to-end with either Tune or Trainer right?
Your implementation looks reasonable. You basically need to put this in a loop, and continuously run it.
the obs and rewards you send to the Server will become the training data for the policy, and your policy should get better at giving you good actions over time if the whole thing is working.
This scripts must be executed at the same time, but is they have a trukey. First you need to execute the server_configuration.py script and then, a few seconds latter (like 7 seconds in my computer), execute the client_configuration.py script. The client run in a loop the ExternalEnv configuration, which start and end the episodes and ask for actions and log the results through the client in the server. Finally, the server make the learning.
It was really difficult to me understand how its works. The documentation is really good, but a global explanation of the example would be necessary. Futhermore, an example without a gym environment could be better too, with the integration of the ExternalEnv configuration to see the complete aplication of an ExternalEnv.
When I finish my configuration I will upload a new example with this considerations