Memory Pressure Issue

First - a big disclaimer: I’m only a beginner with rl-lib, I did use the documents and searched for examples online… but I seem to be stuck and I will appreciate any help.

I’m using google colab (pro with high RAM) and gym custom environment in order to tackle a problem.
I have registered my environment and made a zip with all the required dependencies so it can be created remotely.
The environment action space and observation space are rather large (MultiDiscrete observation vector size of 40,000 and MultiDiscrete action vector with size 5000)
I’m trying to run impala, I use the following setup:

algo = (
.rollouts(num_rollout_workers=1, horizon=5000)
.training(lr=0.0003, train_batch_size=4 , replay_buffer_num_slots = 2, minibatch_buffer_size = 2)

This task above fails at some point before completing, the worker is killed due to memory pressure with the following error:
(raylet) 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: eef8b40b7555cdc707e6e197c05e25175bed6decdd593e103a125ff6,…

Since I haven’t even started the training, I thought it might be something related to memory allocation, maybe because of the large spaces, that’s why I set a small replay buffers just to test it, but it still runs OOM.

Any help is appreciated, what could have made it go OOM before any training? it was just the setup.

Hmm … Are you able to reproduce this issue if you are using any other gym environment?
Can you try using the rllib random env - ray-project/ray - Sourcegraph and seeing if you can use this repro your issue? You should be able configure the observation and action spaces via the environment function’s env_config parameter with your environments observation and action spaces.

Thanks for replying!
I have tried the following:
Using RandomEnv as is with small discrete action and observation space and it worked.
Then I tried changing just the action and observation into MultiDiscrete observation vector size of 40,000 and MultiDiscrete action vector with size 5000 exactly like in my problem, and the issue is reproduced.
Might be worth mentioning that I’m using colab pro with additional memory.
Is there any way for me to handle such large spaces?

Thanks again!

*Update - I tried to isolate the source of the issue further, the large observation space seems to work if the action space is up size of around 100, which is far from the 5000 I need… any suggestions are appreciated.

@avnishn @sven1977 @arturn for ideas…

can you share a link to your repro script or paste it here?

If I can reproduce it on my end then I can potentially get you a fix / work around

@avnishn yes, thank you.
So the following will be enough to reproduce the issue:

class GymLearnEnv(Env):

  def __init__(self, config=None):
    self.observation_space = spaces.Box(low=0, high=100, shape=(76032,))
    self.action_space = spaces.MultiDiscrete(np.full((5000), 10))
    self.count = 0

  def step(self, action):
    self.count = self.count + 1
    reward = 0
    if self.count==500:
    info = {}
    return self.observation_space.sample(), reward, done, info

  def render(self):

  def reset(self):
    self.count = 0
    return self.observation_space.sample()

class WrapperEnv(Env):

  def __init__(self, config=None):
      self.env = GymLearnEnv()
      self.reset_count = -1
      self.action_space = self.env.action_space
      self.observation_space = self.env.observation_space
  def reset(self):
      self.reset_count += 1
      return self.env.reset()
  def step(self, action):
      return self.env.step(action=action)

register_env(“my_env”, GymLearnEnv)

algo = (
.rollouts(num_rollout_workers=1, horizon=500)
.training(lr=0.0001 , replay_buffer_num_slots = 5)

The problem is the action_space size, you could potentially use it in any environment you have and you’ll encounter the same issue.
I tried to cut it down to half, the memory still fills up to the nearly max, making training very slow with a single worker and the loading time of the policy afterwards takes too long to load (around 5 minutes) with the function Policy.from_checkpoint().
I’m using colab pro with high ram setting without gpu for now.
My question is how to handle such large action space? I’m clearly doing something wrong.

I wasn’t able to get an exact answer for you, but I have a rough idea of what the problem is.

The train_batch_size by default in impala is 500. Additionally there is a queue for holding samples that are going to be trained on by a learning thread. On the back of an envelope, each batch of 500 is about a 1gb of data large. If your learner queue gets filled up, which is totally possible since you are on a google colab machine where resources are not exactly plentiful afaict ( a quick google search tells me that you get 25 gb or ram)

so alone if this queue gets filled up, the memory ussage will already be around 20 gigs. Couple that with the size of the ray object store, and that is probably why you are getting ooms.

I would suggest that you start by decreasing the train_batch_size to half (the default is 500), half your learner_queue_size to 8 ,
and decrease your rollout_fragment_length so that the the size of samples of flight is small and you don’t get a oom in your ray object store (the default here is 50. Turn it down to some factor of your train_batch_size).

If that doesn’t work, keep tuning these down by scales of 2 until it does.

@avnishn Thank you, it makes perfect sense.
I have managed to play around with the parameters that you suggested and I can get my model up and running now… thanks :slight_smile:
New problem that arises from the same scenario of complex action space is it now during training the mean_inference_ms is very high (around 1000).
Is there any tip that you can think of that might help tackle that difficulty?

for actions you can reduce your space by using action masking.

There’s also the matter of your observations that are very large.

I guess the question here is, can you afford in some way to discretize your action and observation spaces further?

I doubt that at this size you’ll be able to train a policy to get any meaningful output without any additional tricks on your end to reduce the dimensions of your problem.

Can I ask, what is the problem that you are trying to phase as an RL problem? Its likely that using some heuristic we can decrease the dimensions.

@avnishn Thanks for the input!
unfortunately I can’t share more details about the specific use-case since its not a personal project.
I was just reading about action masking following the explanation here: ray/ at master · ray-project/ray · GitHub
I’m still unsure how to use action masking since I’m a novice, but from what I read so far - it doesn’t reduce the action space… its supposed to help in effective learning, will it actually reduce inference time for the agent?
Its a great solution if it speeds up the inference time since around 90% will be masked so its a great potential.
The observation space can be seen as an image with around 30 channels, each channel is a layer of information, I was thinking to try VisionNetworks as model instead of fc, do you think it might be effective? could it speed up the inference?

Thanks again for helping!