Hello,
I want to implement a custom vectorized environment. I’m aware that RLlib handles the vectorization automatically when you set num_envs_per_worker > 1
by creating multiple environment copies, but for my use case, I need to handle the vectorization myself.
This is because my environment is responsible for running an executable program, and the parallelization is handled inside of this program. I don’t want to create multiple copies of this environment, since it ends up running many instances on a single worker.
I tried implementing my own environment class which extends VectorEnv
, but unfortunately this is not supported anymore (Environments with VectorEnv not able to run in parallel):
TypeError: The environment must inherit from the gymnasium.Env class
I want to start many workers, each of which runs a single executable and this is responsible for the vectorization. The environment step
function will return a batch of observations, rewards, etc.
The environment class that I provide to RLlib looks a little bit like this:
class VecEnvOrchestrator(VectorEnv):
def __init__(self, env_config: dict):
num_envs = env_config.get("num_envs", 1)
observation_space = env_config.get("observation_space", None)
action_space = env_config.get("action_space", None)
# Start executable
instance = lib.start("my_environment.exe")
# Create vectorized environment
self.vec_env = lib.VecEnv(MyEnv, instance, num_envs, observation_space, action_space)
super().__init__(observation_space=observation_space, action_space=action_space, num_envs=num_envs)
def vector_reset(self, *, seeds = None, options = None):
envs = self.vec_env.envs
return [envs[i].reset(seed=seeds[i], options=options[i]) for i in range(self.num_envs)]
def reset_at(self, index = None, *, seed = None, options = None):
env = self.vec_env.envs[index]
return env.reset(seed=seed, options=options)
def vector_step(self, actions):
return self.vec_env.step(actions)
Is it possible to get this working?
What’s the best way to do this?
Thanks