Environments with VectorEnv not able to run in parallel

sai_lalith_Polawar · May 30, 2022, 10:46am

I have created an custom environment class that uses the VectorEnv as a subclass using Cartpole-v1 base env as shown below,

class MockVectorEnv(VectorEnv):


    def __init__(self, num_envs):
        self.env = gym.make("CartPole-v1")
        super().__init__(
            observation_space=self.env.observation_space,
            action_space=self.env.action_space,
            num_envs=num_envs,
        )

    def vector_reset(self) -> List[EnvObsType]:
        obs = self.env.reset()
        return [obs for _ in range(self.num_envs)]

    def reset_at(self, index: Optional[int] = None) -> EnvObsType:
        return self.env.reset()

    def vector_step(self, actions) -> Tuple[List[EnvObsType], List[float], List[bool], List[EnvInfoDict]]:
        obs_batch, rew_batch, done_batch, info_batch = [], [], [], []
        for i in range(self.num_envs):
            obs, rew, done, info = self.env.step(actions[i])
            obs_batch.append(obs)
            rew_batch.append(rew)
            done_batch.append(done)
            info_batch.append(info)
        return obs_batch, rew_batch, done_batch, info_batch

    def get_sub_environments(self) -> List[EnvType]:
        return [self.env for _ in range(self.num_envs)]

But when I am trying to use this env and setting num_envs_per_worker:5 and remote_worker_envs:True , envs are not running in parallel as shown below. When I looked into the dashboard only one worker i.e only one cpu is being used.

if __name__ == "__main__":

    ray.init( ) #dashboard_host="0.0.0.0", dashboard_port=8265)
    tune.register_env("custom_vec_env", lambda env_ctx: MockVectorEnv(1))
    analysis = tune.run(
        "PPO",
        local_dir="./results/tb_logs_1/",
        stop={"timesteps_total": 129000},
        metric="episode_reward_mean",
        mode="max",
        config={
            "env": 'custom_vec_env',
            "use_gae": True,
            "num_workers": 1,  
            "num_envs_per_worker": 5,
            "rollout_fragment_length": 100,
            "sgd_minibatch_size": 128,
            "train_batch_size": 1000,
            "lr": 0.001,
            "gamma": 0.95,
            "entropy_coeff": 0.02,
            "num_sgd_iter": 10,
            "remote_worker_envs": True,
        },
    )

This was not the case when I used CartPole-v1 env directly without custom VectorEnv. Why am I not able to run the envs in parallel using custom VectorEnv? am I doing anything wrong? Please help me with this.

gjoliver · June 2, 2022, 5:17pm

why are you setting MockVectorEnv(1) then “num_envs_per_worker: 5”?
I mean, doesn’t seem like you need a vector env yourself? you can just use CartPole and set num_envs_per_worker: 5, and RLlib will vectorize things for you.

sai_lalith_Polawar · June 3, 2022, 6:51am

Thank you for the reply, I have an custom env and for vectorization I used VectorEnv subclass, so before directly working with my env I am trying to understand it with the basic gym env CartPole-v1.
My understanding was that setting “num_envs_per_worker:5” means it uses the “MockVectorEnv(1)” and creates 5 sub environments and runs them in parallel. Am I understanding it in wrong way? should we use either “VectorEnv” or “num_envs_per_worker” not both at a time? Can not we run the environments in parallel using VectorEnv subclass.? Plese help me on getting clarity on this.

gjoliver · June 3, 2022, 8:42am

ah, no problem.
so normally users don’t have to deal with vectorization themselves.
they just need to provide a single gym env, and when we convert your env into a RLlib BaseEnv, we will create multiple copies, and vectorize it for you.
for example:

github.com

ray-project/ray/blob/master/rllib/env/multi_agent_env.py#L504-L505

      
        
            while len(self.envs) < self.num_envs:
                self.envs.append(self.make_env(len(self.envs)))

self.num_envs is basically controlled by num_envs_per_worker parameter.

sai_lalith_Polawar · June 3, 2022, 12:44pm

Thank you for the clarification. Then is it not possible to run environments in parallel with our custom env using VectorEnv? When I am setting “MockVectorEnv(5)” I can observe that training is very fast but I am not able to see multiple cores running in parallel. Am I missing any logic to understand or am I implementing my “MockVectorEnv” in wrong way. Please help me

arturn · June 3, 2022, 2:16pm

RLlib will wrap your environment automatically, if you tell it that num_envs_per_worker should be > 1. Is training really “much” faster then if you would let RLLib serialize your env?

arturn · June 3, 2022, 2:17pm

And fast in the sense of reaching higher rewards in less wall clock time? Or per sample?

gjoliver · June 3, 2022, 3:51pm

MockVectorEnv is more for demonstration purpose. Probably should not use it in your work.
The multiple vectorized envs on a single worker are stepped through sequentially, so that’s why you notice only 1 CPU is utilized.
Parallelism comes in 2 ways:

if you set num_envs_per_worker=1, but num_workers=5, then these multiple workers will sample their envs in parallel. This is the most common way to scale up your simulation.
if you set num_envs_per_worker>1, then remote_worker_envs=True, that will make each env a remote actor, and will also allow you to run them in parallel.
But I think you should try the num_workers route first.
hope this helps.

sai_lalith_Polawar · June 7, 2022, 2:02pm

Thank you for the reply, I kept stop criteria as timesteps_total, I was just looking into the time than reward , I know it is important to look at reward and all the parameters are tuned to maximize the reward. I request you to take a look in the below table which makes you to clearly understand what I mean

S.no	MockVectorEnv()	num_workers	num_envs_per_worker	remote_wroker_envs	timesteps_total	total time(s)	reward	episode_reward_max	episode_reward_min	episode_len_mean
1	1	1	1	—	100000	128.558	392.38	500	12	392.38
2	1	1	3	FALSE	100000	120.078	342.68	500	29	342.68
3	1	1	3	TRUE	100000	118.02	408.52	500	46	408.56
4	3	1	3	TRUE	100000	57.99	98.14	500	5	98.63
5	3	1	3	FALSE	100000	58.50	193.47	500	11	193.75

considering all the above cases , in ray dashboard I am able to see that out of 12 cpu core only 2 of them are used justifying num_workers+1. Usually when “remote_worker_envs:True” then total_num_cpus= (num_worker+1)+(num_envs_per_worker) in this case each env is taking saperate cpu core to run in parallel and there will be a difference in the total time(s) too between running environments parallel and sequential. In my case I am not able to find these differences. If I am wrong please correct me and help me in understanding it.

sai_lalith_Polawar · June 7, 2022, 2:04pm

Thank you for such good explanation. Can I know what exactly is the differenece between the num_worker and ray actor?

gjoliver · June 7, 2022, 4:13pm

RLlib workers are implemented using Ray actors.

Topic		Replies	Views
Integrate Custom Vectorized Environment with RLlib RLlib	2	31	July 22, 2025
Only use Ray to vectorize environment RLlib	4	420	July 15, 2021
Vector Environment With A Single Worker RLlib	7	337	March 27, 2024
External Env vs Vectorized Env RLlib	3	480	March 12, 2021
Num_env_runners VS num_envs_per_env_runner with remote_worker_envs=True RLlib	3	125	November 2, 2024

Environments with VectorEnv not able to run in parallel

Related topics