I am working with an environment with a large DNN in its step function. To reduce wall-clock time, I want to run parallel environments and pass batches to the DNN, as I don’t have enough VRAM to create many sub-environments (like how the standard vectorised environment works). Instead, I’d like to pass action vectors into a single instance of the environment so the DNN does not need to be cloned.
Thank you, but I don’t think that this will work. This approach uses VectorEnv, which creates clones of the environment. As I said in the first post, I don’t want to create clones of the environment.
The Env uses a DNN in the step function, so I want to pass multiple observations through it each timestep, taking multiple actions. So the env would be parallel, but I would only need one instance of the DNN.
I think the most straightforward way to do this would be to implement your own custom VectorEnv. Thus env will be one instance and you would handle the parallelism how ever you intend to.
Combine this with num_envs_per_worker to control how many Parallel simulations you want that environment to manage.
The other option that I think would work is to treat it as a multiagent environment. Make each agent_id env_{0,…,n} and map all the agents to the same policy.
It should handle envs being done fine but the main thing you will have to work out is how the multiagent env finishes. Once a subenv finishes you won’t be able to reset it until they are all done. Personally I would avoid this approach and work out the custom VectorEnv.
Thank you for the suggestions! I agree that writing a custom version of VectorEnv is probably the best way to go. Looking at the code, it looks like a pretty big rewrite, though. I have ended up bodging it a bit by overwriting the training_step() function of the algorithm I’m using. I just call my custom data collector instead of collecting samples using the workers.
I have implemented a custom VectorEnv that takes a batch of actions and returns a batch of observations via one instance of DNN. Here’s a pseudo code of my env:
I train a PPO model for this MyVecEnv via ray Tuner with num_envs = 16. However, I found that performances and results differ greatly when num_envs_per_worker is varied. The best performance is reached when num_envs_per_worker == num_envs.
You mentioned: “Combine this with num_envs_per_worker to control how many Parallel simulations you want that environment to manage.”
Does this indicate that we have to set num_envs_per_worker equal to num_envs?