Error while running RLLib training with tune

Hi

I am getting this error while running RLLib training on a Unity3D Env after 4000-10000 steps:

ValueError: The parameter loc has invalid values
In tower 0 on device cpu

Hi geekyneuro,

I can not find the phrase “The parameter [x] has invalid values” inside the Ray library.
Can you maybe post a reproduction script or give more detail?

Cheers :slight_smile:

Here is the complete error:

cat /data/afandang/exp/ray_3/ray_3/PPO_arora-v0_6ceb3_00000_0_2021-08-17_16-26-38/error.txt
Failure # 1 (occurred at 2021-08-17_17-23-46)
Traceback (most recent call last):
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/tune/trial_runner.py", line 739, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/tune/ray_trial_executor.py", line 729, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 82, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/worker.py", line 1564, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=147444, ip=192.168.1.10)
  File "python/ray/_raylet.pyx", line 534, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task.function_executor
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/_private/function_manager.py", line 563, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 643, in train
    raise e
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 629, in train
    result = Trainable.train(self)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/tune/trainable.py", line 237, in train
    result = self.step()
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 170, in step
    res = next(self.train_exec_impl)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/util/iter.py", line 791, in apply_foreach
    result = fn(item)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/execution/train_ops.py", line 65, in __call__
    info = do_minibatch_sgd(
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/utils/sgd.py", line 119, in do_minibatch_sgd
    batch_fetches = (local_worker.learn_on_batch(
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 935, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/utils/threading.py", line 21, in wrapper
    return func(self, *a, **k)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 468, in learn_on_batch
    grads, fetches = self.compute_gradients(postprocessed_batch)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/policy/policy_template.py", line 335, in compute_gradients
    return parent_cls.compute_gradients(self, batch)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/utils/threading.py", line 21, in wrapper
    return func(self, *a, **k)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 519, in compute_gradients
    tower_outputs = self._multi_gpu_parallel_grad_calc(batches)
  File "/opt/conda/envs/navsim/lib/python3.8/site-packages/ray/rllib/policy/torch_policy.py", line 867, in _multi_gpu_parallel_grad_calc
    raise last_result
ValueError: The parameter loc has invalid values
In tower 0 on device cpu

The code used is

        import ray.rllib.agents.ppo as ppo
        config = ObjDict(ppo.DEFAULT_CONFIG.copy())
        for arg in run_config:
            if arg in config:
                config[arg] = run_config[arg]
        config["env_config"]=env_config
        config["ignore_worker_failures"]=True
        # TODO: Override ray's conf with some defaults from navsim
        import ray
        ray.shutdown()
        ray.init(ignore_reinit_error=True,local_mode=True)
        navsim_envs.env.AroraGymEnv.register_with_ray()
        result = ray.tune.run(
            ppo.PPOTrainer,
            config=config,
            name=run_config.run_id,
            resume=run_config.resume,
            local_dir=str(run_base_folder),
            stop={"episodes_total": run_config.total_episodes},
            checkpoint_freq=run_config.checkpoint_interval,
            checkpoint_at_end=True
        )

That error comes from Pytorch’s distribution class, I would assume from Normal. E.g. see here, when you provide loc and scale to the constructor, they need to satisfy the constraints defined by arg_constraints = {'loc': constraints.real, 'scale': constraints.positive}.

This means that either loc becomes inf or nan, which can be because of instability during training or just your model outputting a large value from the logits branch (if you’re using something similar to the built in FCnet).