ValueError in simple Tuner/Pytorch prototype

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I’m an RLlib newbie and got some tutorial examples to run. Now trying to set up a simple structure to begin Tune-ing the HPs. With the simple program below, it runs up to 12 iterations okay, then throws “ValueError: Expected parameter loc (Tensor of shape (128, 1)) of distribution Normal(loc: torch.Size([128, 1]), scale: torch.Size([128, 1])) to satisfy the constraint Real(), but found invalid values”. I have played with commenting out various combinations of config params (to allow use of defaults). It seems that using all shown here cause the problem, but I can’t pin it down to any particular one. This leads me to believe the params are okay but something else is going on under the hood.

import ray
from ray import air, tune
import ray.rllib.algorithms.ppo as ppo

ray.init()

algo = "PPO"

run_config = ppo.DEFAULT_CONFIG.copy()
run_config["env"]                               = "MountainCarContinuous-v0"
run_config["framework"]                         = "torch"
run_config["num_gpus"]                          = 0 #for the local worker
run_config["num_cpus_per_worker"]               = 1 #also applies to the local worker
run_config["num_gpus_per_worker"]               = 0
run_config["num_workers"]                       = 2 #num remote workers (remember that there is a local worker also)
run_config["num_envs_per_worker"]               = 1
run_config["rollout_fragment_length"]           = 200 #timesteps
run_config["gamma"]                             = 0.99
run_config["lr"]                                = 0.01 
                                                    #tune.choice([0.01, 0.001, 0.0001])
run_config["train_batch_size"]                  = 4000 #tune.choice([400, 1000, 4000])
run_config["evaluation_interval"]               = None
run_config["evaluation_duration"]               = 10
run_config["evaluation_duration_unit"]          = "episodes"
run_config["evaluation_parallel_to_training"]   = False
run_config["log_level"]                         = "INFO"
run_config["seed"]                              = 555 #None, 8 sometimes causes fault
# Add dict here for lots of model HPs

print("\n///// Run configs are:\n")
for item in run_config:
    print("{}:  {}".format(item, run_config[item]))

tune_config = tune.TuneConfig(
                metric  = "episode_reward_mean",
                mode    = "max"
              )
#stop criteria?

tuner = tune.Tuner(algo, param_space=run_config, tune_config=tune_config)
print("\n///// Tuner created.\n")

tuner.fit()

Here is a snippet of the very long crash output:

+------------------------------------------+----------+-------------------+--------+------------------+-------+----------+------------------------+----------------------+----------------------+
| Trial name                               | status   | loc               |   iter |   total time (s) |    ts |   reward |   num_recreated_wor... |   episode_reward_max |   episode_reward_min |
|------------------------------------------+----------+-------------------+--------+------------------+-------+----------+------------------------+----------------------+----------------------|
| PPO_MountainCarContinuous-v0_96ab6_00000 | RUNNING  | 10.0.0.180:174850 |     11 |          48.9855 | 44000 |  11.7808 |                      0 |              80.3617 |             -94.3231 |
+------------------------------------------+----------+-------------------+--------+------------------+-------+----------+------------------------+----------------------+----------------------+


2022-09-29 21:10:53,948	ERROR trial_runner.py:980 -- Trial PPO_MountainCarContinuous-v0_96ab6_00000: Error processing event.
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=174850, ip=10.0.0.180, repr=PPO)
  File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 87, in loss
    curr_action_dist = dist_class(logits, model)
  File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 239, in __init__
    self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
  File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/torch/distributions/normal.py", line 54, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 1)) of distribution Normal(loc: torch.Size([128, 1]), scale: torch.Size([128, 1])) to satisfy the constraint Real(), but found invalid values:

Any help or suggestions would be most appreciated.

Your learning rate is quite high. Try either lowering that or setting a lr schedule to anneal it over time.

You could also try setting grad_clip to a lower value.

Here are some common ranges for PPO hyperparametrs

Yes, I realize the values aren’t necessarily realistic, but I would hope that that isn’t the cause of a fatal error in the Ray code. Please understand that this is a toy problem intended to simply help me understand the mechanics of getting Tuner to work. Having said that, I now try running with lr = 0.001, and behold it is running okay! Thanks for the catch @mannyv. I’m still concerned that Ray/RLlib wouldn’t handle such a situation more gracefully.

@starkj
you can easily blow up the NN with a large learning rate.
some of the weights and the output will become NaN, and you will get these invalid param error, since Torch can’t construct a distribution from NaN inputs.
Hyper parameters are usually quite important. Not just performance, but whether the whole thing will work or not.
I also notice that you put all the configs in param_space. While it works fine, this may not be the most efficient, since usually we only search for a few parameters.
If you need a good example of how to use RL with Ray AIR, you can take a look at this notebook: ray/rl_online_example.ipynb at master · ray-project/ray · GitHub

So this particular problem seems to be more of my lack of care bumping into a limitation in torch. Fair enough. Thanks @gjoliver or the insights on better using HPs!