RLlib crashes with more workers and envs

I also posted this on github yesterday, but nobody answered.
RLlib crashes with this env, when lowering the number of workers and envs/worker it runs a while longer, but it still crashes.
The same thing happens with PPO, APPO, IMPALA
My hardware config is: amd 16-core, 128gb ram, rtx3060 12gb
Ubuntu 20 and Windows 11, gym 0.21.0
I have tried ray 1.9.2 to 1.13.0 all of them crashes it seems that 1.9.2 and 1.11.0 runs somewhat longer than newer versions until it crashes.

This is the error

[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan]], device='cuda:0', grad_fn=)

In tower 0 on device cuda:0
Traceback (most recent call last):
File "/home/usr1/Proj/ray_test.py", line 57, in
result = agent.train()
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/tune/trainable.py", line 360, in train
result = self.step()
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1136, in step
raise e
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1112, in step
step_attempt_results = self.step_attempt()
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 1214, in step_attempt
step_results = self._exec_plan_or_training_iteration_fn()
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/rllib/agents/trainer.py", line 2211, in _exec_plan_or_training_iteration_fn
results = next(self.train_exec_impl)
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 779, in next
return next(self.built_iterator)
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 807, in apply_foreach
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 807, in apply_foreach
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 869, in apply_filter
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 869, in apply_filter
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 807, in apply_foreach
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 869, in apply_filter
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 1108, in build_union
item = next(it)
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 779, in next
return next(self.built_iterator)
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 807, in apply_foreach
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/iter.py", line 807, in apply_foreach
for item in it:
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/rllib/execution/concurrency_ops.py", line 143, in base_iterator
raise RuntimeError(
RuntimeError: Dequeue check() returned False! Exiting with Exception from Dequeue iterator.
Exception ignored in: <function RolloutWorker.del at 0x7f5ba552c3a0>
Traceback (most recent call last):
File "/home/usr1/Proj/.env/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 461, in _resume_span
TypeError: 'NoneType' object is not callable

And this is the reproduction script

import gym
import numpy as np
import ray
from ray import tune
from ray.tune.logger import pretty_print
from ray.rllib.agents import impala
import random

class MyEnv(gym.Env):
    def __init__(self, config=None):
        super(MyEnv, self).__init__()        

        self.action_space = gym.spaces.Box(
            low=-1, high=1, shape=(2,), dtype=np.float32)
        self.observation_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(40500,), dtype=np.float32)
    def _next_observation(self):
      obs = np.random.rand(40500)
      return obs

    def _take_action(self, action):
      self._reward = random.randrange(-1,1)

    def step(self, action):        
        # Execute one time step within the environment
        self._reward = 0
        done = False        
        obs = self._next_observation()
        return obs, self._reward, done, {}

    def reset(self):
        self._reward = 0
        self.total_reward = 0       
        self.visualization = None
        return self._next_observation()

if __name__ == "__main__":

    cfg = impala.DEFAULT_CONFIG.copy()    
    cfg["env"] = MyEnv
    cfg["num_gpus"] = 1
    cfg["num_workers"] = 4
    cfg["num_envs_per_worker"] = 4
    cfg["framework"] = "torch"
    cfg["horizon"] = 500      
    cfg["model"] = {
                    "fcnet_hiddens": [512, 512],
    agent = impala.ImpalaTrainer(config=cfg, env=MyEnv)

    i = 0
    while True:
        result = agent.train()        
        #result = tune.run("IMPALA", config=cfg, verbose=1)
        if i % 35 == 0:  # save every 100th training iteration            
            checkpoint_path = agent.save()            
            #checkpoint_path = tuner.save()
        i += 1     

when using the default fc_hiddens it runs a little longer then it crashes

  • High: It blocks me to complete my task.

Hi Evo! :wave:t3:

This sounds like a good question to ask in RLlib Office Hours! :writing_hand:t3: Just add your question to this doc: RLlib Office Hours - Google Docs

Thanks! Hope to see you there!

1 Like

I have tried to make the observation smaller by 60% (with default fcnet) also lowered the workers to 2 and envs/worker to 1, but ray 1.11.0 crashes after 15min / 1.6mil steps and ray 1.13.0 crashes after 1min / 300k steps

I have lowered the ray version to 1.11.0 and lowered the observation to 7000, 1 worker and 1 env/worker and I managed to get around 12mil steps until it crashed again.

After trying to restore it runs a few steps then it crashes again, even with 0 workers and with num_gpus 0 ! So it is not a problem with cuda.
I really don’t know what is going on

I got this error running only on CPU!

[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]], grad_fn=<SplitBackward0>)
 tracebackTraceback (most recent call last):
tracebackTraceback (most recent call last):
  File "d:\proj\.env\lib\site-packages\ray\rllib\policy\torch_policy.py", line 1012, in _worker
self._loss(self, model, self.dist_class, sample_batch))
  File "d:\proj\.env\lib\site-packages\ray\rllib\agents\impala\vtrace_torch_policy.py", line 117, in build_vtrace_loss
    action_dist = dist_class(model_out, model)

with ray 1.13.0 it crashes after 100k steps

I have lowered the observation to 6000 and with ray 1.11.0 (3 workers and 4 envs/worker) it did not crash until I stopped the training after 45mil. steps, but I have noticed that if I increase the learning rate from 0.00001 to 0.0001 it crashes after 500k steps.

This is likely a training / env issue. It looks like NaN’s are being introduced into the path of your NN. You should check if your policy is outputting the NaNs or if they are coming from your environment.

There are also some logging keys to figure out the norm of your gradients. Typically you don’t want your grad norms to be large otherwise your training can collapse or become unstable leading to NaNs in the critical path.

@avnishn which one? the var_gnorm? The environment does not introduce NaNs, I have checked, it crashes faster when I increase the learning rate and workers. If I set a very low learning rate it does not crash (with ray 1.11.0) with 1.13.0 it crashes always.
And how can I reduce the grad norms?

In this image it crashed at 16M both logged trainings

yup that would be norm of the gradients on all of your params in your network.

You cant enable gradient clipping and will probably help you by allowing you to tune your LR in a less brittle space.

Although there is the question of why your gradient norms are increasing the way that they’re increasing… It’s unclear why thats the case, but there’s a good chance that having a lower LR and combing that with grad clipping should remove those NaNs.

1 Like

Try looking at the entropy loss. It may be too large, resulting in NaN values that propagate to your network weights. This would require proper handling of the entropy coefficient and agent exploration scheme.

1 Like