ExternalEnv in a secuential simulator running locally? And how to register the environment

hermmanhender · February 4, 2022, 7:34pm

Hi everyone! I’m really new in RLlib and in Ray too. I have a knowleadge in RL but I never use a library for RL before.

I’ll would try to be clear to expose my problem.
I have a simulator wich run in my PC. The simulator works in a secuential process and I can sense the state (it is composed for seven differents variables and in the future maybe more) in a sertain point of the simulation and set actuators (aply an action) in the same point to try change the next state. The interaction with the simulator is completely in Python. So untill here is all ok.

But now is were I’m lost. I would like to the simulator with the RLlib library.

It is correct if a try to connect the simulator with a ExternalEnv configuration? And if this is the way, I like to know, how to use ExternalEnv? I’m not really sure of the configuration of the ExternalEnv…

I understand that the action can be selected with get_action method and I must to register the reward with log_returns after that. Also I understand the necesity of start_episode method, but I don’t know how to configur the run method in ExternalEnv and how to register the environment to be used.

I hope that this not be a trivial question and that you can help me, thaks!

gjoliver · February 7, 2022, 1:28am

On a high level, ExternalEnv allows you to run your Env outside of RLlib remotely.
And use RLlib as a policy server.
If you need to compute an action and use it with your remote simulator, you just query RLlib with the OBS. and then you can log the reward returned for that OBS and action.
one thing to realize is that RLlib only needs those OBS and reward to optimize the policy, so you don’t need to register your environment with the RLlib server.
Have you seen the Cartpole server and client examples?

github.com

ray-project/ray/blob/master/rllib/examples/serving/cartpole_server.py

#!/usr/bin/env python
"""
Example of running an RLlib policy server, allowing connections from
external environment running clients. The server listens on
(a simple CartPole env
in this case) against an RLlib policy server listening on one or more
HTTP-speaking ports. See `cartpole_client.py` in this same directory for how
to start any number of clients (after this server has been started).

This script will not create any actual env to illustrate that RLlib can
run w/o needing an internalized environment.

Setup:
1) Start this server:
    $ python cartpole_server.py --num-workers --[other options]
      Use --help for help.
2) Run n policy clients:
    See `cartpole_client.py` on how to do this.

The `num-workers` setting will allow you to distribute the incoming feed over n

This file has been truncated. show original

hermmanhender · February 17, 2022, 8:15pm

Hello @gjoliver , really thank you for your response.
I read the file that you shared and in adition the ExternalEnv configuration in Ray docs.

I used the ExternalEnv file as a template and added the functions that run the external simulator in my computer and produce a loop that executed continuously individual episodes wich contain:

        1. Call self.start_episode(episode_id)
        2. Call self.get_action(episode_id, obs)
        3. Call self.log_returns(episode_id, reward)
        4. Call self.end_episode(episode_id, obs)
        5. Wait if nothing to do. #This point I'm not sure what means and for that reason I don't know if it is implemented in my code.

I’m not sure if my implementation is correct, because I don’t know how to use now this environment in a Tun or Trainer configuration with ray.

There are examples wich configurate the execution of an ExternalEnv with Tune or Trainer? Or someone know how to do that?

gjoliver · February 21, 2022, 8:25pm

The cartpole_server.py example shows you how to run this end-to-end with either Tune or Trainer right?

Your implementation looks reasonable. You basically need to put this in a loop, and continuously run it.

the obs and rewards you send to the Server will become the training data for the policy, and your policy should get better at giving you good actions over time if the whole thing is working.

hermmanhender · February 25, 2022, 11:43am

Thanks! I finally understand how works

It was needed configured 3 scripts:

the ExternalEnv configuration, which is capable to run in a loop episode per episode;
server configuration (I took as model the cartpole_server.py example); and
client configuration (I took as model the cartpole_client.py example).

This scripts must be executed at the same time, but is they have a trukey. First you need to execute the server_configuration.py script and then, a few seconds latter (like 7 seconds in my computer), execute the client_configuration.py script. The client run in a loop the ExternalEnv configuration, which start and end the episodes and ask for actions and log the results through the client in the server. Finally, the server make the learning.

It was really difficult to me understand how its works. The documentation is really good, but a global explanation of the example would be necessary. Futhermore, an example without a gym environment could be better too, with the integration of the ExternalEnv configuration to see the complete aplication of an ExternalEnv.

When I finish my configuration I will upload a new example with this considerations

Topic		Replies	Views
How to use an environment that runs outside Python with RLlib? RLlib	1	413	February 1, 2021
Custom simulator with as RLlib environment RLlib	1	471	December 17, 2020
External Env crashes during training step RLlib	3	443	November 4, 2021
Trying to set up external RL environment and having trouble RLlib	14	1419	September 28, 2021
ExternalEnv vs. External Application Clients? RLlib	3	543	July 12, 2021

ExternalEnv in a secuential simulator running locally? And how to register the environment

Related topics