Hi there, I have been trying to figure this out for a while now without success so I come here for help.
I have an external simulation environment running robotics simulations in parallel. It is implemented in C++ in a vectorized way, and is managed by a custom python wrapper. Such wrapper does not strictly follow Gym conventions in terms of method definitions and what they return, but it serves the same purpose. Until now, custom implementations of RL algorithms have been used to train policies in this environment. I have tried to wrap this custom python wrapper in a VectorEnv class to be able to use RLLib and its various algorithms for training, without success. I am unsure if I should use the ExternalEnv class instead.
I believe the problem comes down to how the environment parallelization is implemented at a low level, which basically steps all environments synchronously, meaning that methods like reset_at() from VectorEnv cannot be implemented (because the environments are stepped and reset synchronously). I have tried using the gym.Env class as wrapper but it has obviously not worked because the observations are of size (num_envs, obs_dims) and similarly for actions, rewards, dones and infos.
This makes me think that what I actually need is to use ExternalEnv class, but this seems appropriate for a single environment, or is there a way to use ExternalEnv class in a vectorized way?
Happy to provide further details if the above seems unclear. Thanks in advance!
My use case only allows one environment per process. If you are training using a high-throughput algorithm like IMPALA, you can spawn a separate worker for each environment rather than vectorized environments on a single worker. The performance is still great, because you can have 30+ processes chugging away collecting samples asynchronously.
This likely also requires less changes to your code than other solutions.
Ok, I somehow expected this was going to be the case. In the meantime, I managed to get a setup running by forcing my C++ application to only spawn 1 environment per process, but I haven’t looked at performance comparisons yet. Hopefully it will be good enough to get going. Thanks for the help!
Hey @feracero and @smorad , thanks for the question and answer!
ExternalEnv currently only support single envs (not vectorized). You can get around that by using the multi-agent env. Could you check out our Unity3D examples? This is basically the same setup: Single Unity3d engine with n scenes running in parallel inside that engine instance and each scene containing let’s say 1 agent. Of that, we create a MultiAgentEnv with n agents (all using and training the same policy).
The examples and the Unity adapte are located here:
rllib/env/wrappers/unity3d_env.py (<- a MultiAgentEnv sub-class implementing only step and reset).
rllib/examples/serving/unity3d_[client|server].py (<- should show how this can be run as client/server setup against your C++ simulator)