Integrate Custom Vectorized Environment with RLlib

Hello,

I want to implement a custom vectorized environment. I’m aware that RLlib handles the vectorization automatically when you set num_envs_per_worker > 1 by creating multiple environment copies, but for my use case, I need to handle the vectorization myself.

This is because my environment is responsible for running an executable program, and the parallelization is handled inside of this program. I don’t want to create multiple copies of this environment, since it ends up running many instances on a single worker.

I tried implementing my own environment class which extends VectorEnv, but unfortunately this is not supported anymore (Environments with VectorEnv not able to run in parallel):

TypeError: The environment must inherit from the gymnasium.Env class

I want to start many workers, each of which runs a single executable and this is responsible for the vectorization. The environment step function will return a batch of observations, rewards, etc.

The environment class that I provide to RLlib looks a little bit like this:

class VecEnvOrchestrator(VectorEnv):
    def __init__(self, env_config: dict):
        num_envs = env_config.get("num_envs", 1)
        observation_space = env_config.get("observation_space", None)
        action_space = env_config.get("action_space", None)

        # Start executable
        instance = lib.start("my_environment.exe")

        # Create vectorized environment
        self.vec_env = lib.VecEnv(MyEnv, instance, num_envs, observation_space, action_space)

        super().__init__(observation_space=observation_space, action_space=action_space, num_envs=num_envs)

    def vector_reset(self, *, seeds = None, options = None):
        envs = self.vec_env.envs
        return [envs[i].reset(seed=seeds[i], options=options[i]) for i in range(self.num_envs)]

    def reset_at(self, index = None, *, seed = None, options = None):
        env = self.vec_env.envs[index]
        return env.reset(seed=seed, options=options)

    def vector_step(self, actions):
        return self.vec_env.step(actions)

Is it possible to get this working?
What’s the best way to do this?

Thanks

Hello! This is a good question, I’ll try to help out the best I can :sweat_smile:

Can you try making your environment class extend gymnasium.Env instead of VectorEnv, like this: class VecEnvOrchestrator(gym.Env) instead of VecEnvOrchestrator(VectorEnv). This class will act as a translator between RLlib and your inner vectorized stuff. Then set num_envs_per_worker=1 to ensure RLlib only creates one instance of your wrapper per worker, preventing multiple executables on the same machine.

Some helpful reading:

Hopefully someone who has more experience with this can chime in and help :crossed_fingers:

Hi Christina,

Thank you for your quick response. I’ve tried making my VecEnvOrchestrator class extend gymnasium.Env but I had quite a few problems. Specifically, I require that the step function return a batch of observations, rewards, terminated, truncated and info values for each sub environment (which I’ve been unable to achieve with gymnasium.Env). I also require it to take a batch of actions, since I need to apply the actions to each sub environment handled by my inner vectorization. Hopefully this makes sense from the code snippet I provided above.

I’ve encountered some errors like 2D box spaces are not supported (when I tried to batch the observations) and that the reward is not a float (when I tried to batch the rewards) when using gymnasium.Env.

Thanks