How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi, I’m an RLlib newbie and got some tutorial examples to run. Now trying to set up a simple structure to begin Tune-ing the HPs. With the simple program below, it runs up to 12 iterations okay, then throws “ValueError: Expected parameter loc (Tensor of shape (128, 1)) of distribution Normal(loc: torch.Size([128, 1]), scale: torch.Size([128, 1])) to satisfy the constraint Real(), but found invalid values”. I have played with commenting out various combinations of config params (to allow use of defaults). It seems that using all shown here cause the problem, but I can’t pin it down to any particular one. This leads me to believe the params are okay but something else is going on under the hood.
import ray
from ray import air, tune
import ray.rllib.algorithms.ppo as ppo
algo = "PPO"
run_config = ppo.DEFAULT_CONFIG.copy()
run_config["env"] = "MountainCarContinuous-v0"
run_config["framework"] = "torch"
run_config["num_gpus"] = 0 #for the local worker
run_config["num_cpus_per_worker"] = 1 #also applies to the local worker
run_config["num_gpus_per_worker"] = 0
run_config["num_workers"] = 2 #num remote workers (remember that there is a local worker also)
run_config["num_envs_per_worker"] = 1
run_config["rollout_fragment_length"] = 200 #timesteps
run_config["gamma"] = 0.99
run_config["lr"] = 0.01
#tune.choice([0.01, 0.001, 0.0001])
run_config["train_batch_size"] = 4000 #tune.choice([400, 1000, 4000])
run_config["evaluation_interval"] = None
run_config["evaluation_duration"] = 10
run_config["evaluation_duration_unit"] = "episodes"
run_config["evaluation_parallel_to_training"] = False
run_config["log_level"] = "INFO"
run_config["seed"] = 555 #None, 8 sometimes causes fault
# Add dict here for lots of model HPs
print("\n///// Run configs are:\n")
for item in run_config:
print("{}: {}".format(item, run_config[item]))
tune_config = tune.TuneConfig(
metric = "episode_reward_mean",
mode = "max"
#stop criteria?
tuner = tune.Tuner(algo, param_space=run_config, tune_config=tune_config)
print("\n///// Tuner created.\n")
Here is a snippet of the very long crash output:
| Trial name | status | loc | iter | total time (s) | ts | reward | num_recreated_wor... | episode_reward_max | episode_reward_min |
| PPO_MountainCarContinuous-v0_96ab6_00000 | RUNNING | | 11 | 48.9855 | 44000 | 11.7808 | 0 | 80.3617 | -94.3231 |
2022-09-29 21:10:53,948 ERROR -- Trial PPO_MountainCarContinuous-v0_96ab6_00000: Error processing event.
ray.exceptions.RayTaskError(ValueError): ray::PPO.train() (pid=174850, ip=, repr=PPO)
File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/", line 87, in loss
curr_action_dist = dist_class(logits, model)
File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/ray/rllib/models/torch/", line 239, in __init__
self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/torch/distributions/", line 54, in __init__
super(Normal, self).__init__(batch_shape, validate_args=validate_args)
File "/home/starkj/miniconda3/envs/ray_tutorial/lib/python3.9/site-packages/torch/distributions/", line 55, in __init__
raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (128, 1)) of distribution Normal(loc: torch.Size([128, 1]), scale: torch.Size([128, 1])) to satisfy the constraint Real(), but found invalid values:
Any help or suggestions would be most appreciated.