Environments with VectorEnv not able to run in parallel

I have created an custom environment class that uses the VectorEnv as a subclass using Cartpole-v1 base env as shown below,

class MockVectorEnv(VectorEnv):

    def __init__(self, num_envs):
        self.env = gym.make("CartPole-v1")

    def vector_reset(self) -> List[EnvObsType]:
        obs = self.env.reset()
        return [obs for _ in range(self.num_envs)]

    def reset_at(self, index: Optional[int] = None) -> EnvObsType:
        return self.env.reset()

    def vector_step(self, actions) -> Tuple[List[EnvObsType], List[float], List[bool], List[EnvInfoDict]]:
        obs_batch, rew_batch, done_batch, info_batch = [], [], [], []
        for i in range(self.num_envs):
            obs, rew, done, info = self.env.step(actions[i])
        return obs_batch, rew_batch, done_batch, info_batch

    def get_sub_environments(self) -> List[EnvType]:
        return [self.env for _ in range(self.num_envs)]

But when I am trying to use this env and setting num_envs_per_worker:5 and remote_worker_envs:True , envs are not running in parallel as shown below. When I looked into the dashboard only one worker i.e only one cpu is being used.

if __name__ == "__main__":

    ray.init( ) #dashboard_host="", dashboard_port=8265)
    tune.register_env("custom_vec_env", lambda env_ctx: MockVectorEnv(1))
    analysis = tune.run(
        stop={"timesteps_total": 129000},
            "env": 'custom_vec_env',
            "use_gae": True,
            "num_workers": 1,  
            "num_envs_per_worker": 5,
            "rollout_fragment_length": 100,
            "sgd_minibatch_size": 128,
            "train_batch_size": 1000,
            "lr": 0.001,
            "gamma": 0.95,
            "entropy_coeff": 0.02,
            "num_sgd_iter": 10,
            "remote_worker_envs": True,

This was not the case when I used CartPole-v1 env directly without custom VectorEnv. Why am I not able to run the envs in parallel using custom VectorEnv? am I doing anything wrong? Please help me with this.

why are you setting MockVectorEnv(1) then “num_envs_per_worker: 5”?
I mean, doesn’t seem like you need a vector env yourself? you can just use CartPole and set num_envs_per_worker: 5, and RLlib will vectorize things for you.

Thank you for the reply, I have an custom env and for vectorization I used VectorEnv subclass, so before directly working with my env I am trying to understand it with the basic gym env CartPole-v1.
My understanding was that setting “num_envs_per_worker:5” means it uses the “MockVectorEnv(1)” and creates 5 sub environments and runs them in parallel. Am I understanding it in wrong way? should we use either “VectorEnv” or “num_envs_per_worker” not both at a time? Can not we run the environments in parallel using VectorEnv subclass.? Plese help me on getting clarity on this.

ah, no problem.
so normally users don’t have to deal with vectorization themselves.
they just need to provide a single gym env, and when we convert your env into a RLlib BaseEnv, we will create multiple copies, and vectorize it for you.
for example:

self.num_envs is basically controlled by num_envs_per_worker parameter.

Thank you for the clarification. Then is it not possible to run environments in parallel with our custom env using VectorEnv? When I am setting “MockVectorEnv(5)” I can observe that training is very fast but I am not able to see multiple cores running in parallel. Am I missing any logic to understand or am I implementing my “MockVectorEnv” in wrong way. Please help me

RLlib will wrap your environment automatically, if you tell it that num_envs_per_worker should be > 1. Is training really “much” faster then if you would let RLLib serialize your env?

And fast in the sense of reaching higher rewards in less wall clock time? Or per sample?

MockVectorEnv is more for demonstration purpose. Probably should not use it in your work.
The multiple vectorized envs on a single worker are stepped through sequentially, so that’s why you notice only 1 CPU is utilized.
Parallelism comes in 2 ways:

  1. if you set num_envs_per_worker=1, but num_workers=5, then these multiple workers will sample their envs in parallel. This is the most common way to scale up your simulation.
  2. if you set num_envs_per_worker>1, then remote_worker_envs=True, that will make each env a remote actor, and will also allow you to run them in parallel.
    But I think you should try the num_workers route first.
    hope this helps.
1 Like

Thank you for the reply, I kept stop criteria as timesteps_total, I was just looking into the time than reward , I know it is important to look at reward and all the parameters are tuned to maximize the reward. I request you to take a look in the below table which makes you to clearly understand what I mean

S.no MockVectorEnv() num_workers num_envs_per_worker remote_wroker_envs timesteps_total total time(s) reward episode_reward_max episode_reward_min episode_len_mean
1 1 1 1 100000 128.558 392.38 500 12 392.38
2 1 1 3 FALSE 100000 120.078 342.68 500 29 342.68
3 1 1 3 TRUE 100000 118.02 408.52 500 46 408.56
4 3 1 3 TRUE 100000 57.99 98.14 500 5 98.63
5 3 1 3 FALSE 100000 58.50 193.47 500 11 193.75

considering all the above cases , in ray dashboard I am able to see that out of 12 cpu core only 2 of them are used justifying num_workers+1. Usually when “remote_worker_envs:True” then total_num_cpus= (num_worker+1)+(num_envs_per_worker) in this case each env is taking saperate cpu core to run in parallel and there will be a difference in the total time(s) too between running environments parallel and sequential. In my case I am not able to find these differences. If I am wrong please correct me and help me in understanding it.

Thank you for such good explanation. Can I know what exactly is the differenece between the num_worker and ray actor?

RLlib workers are implemented using Ray actors.