Hi there, I have been trying to figure this out for a while now without success so I come here for help.
I have an external simulation environment running robotics simulations in parallel. It is implemented in C++ in a vectorized way, and is managed by a custom python wrapper. Such wrapper does not strictly follow Gym conventions in terms of method definitions and what they return, but it serves the same purpose. Until now, custom implementations of RL algorithms have been used to train policies in this environment. I have tried to wrap this custom python wrapper in a VectorEnv class to be able to use RLLib and its various algorithms for training, without success. I am unsure if I should use the ExternalEnv class instead.
I believe the problem comes down to how the environment parallelization is implemented at a low level, which basically steps all environments synchronously, meaning that methods like reset_at() from VectorEnv cannot be implemented (because the environments are stepped and reset synchronously). I have tried using the gym.Env class as wrapper but it has obviously not worked because the observations are of size (num_envs, obs_dims) and similarly for actions, rewards, dones and infos.
This makes me think that what I actually need is to use ExternalEnv class, but this seems appropriate for a single environment, or is there a way to use ExternalEnv class in a vectorized way?
Happy to provide further details if the above seems unclear. Thanks in advance!