ValueError: Expected parameter logits in Categorical

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi,

I’m trying to train a QMix algorithm. I’m using a custom environment. When I start the training everything goes well, but in certain point appearce this error:

2023-08-25 13:12:19,048 ERROR tune_controller.py:911 -- Trial task failed for trial QMIX_EPEnv_a2374_00000
Traceback (most recent call last):
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\_private\auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\_private\worker.py", line 2493, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::QMix.train() (pid=24624, ip=127.0.0.1, actor_id=7061c53d6556519d949f97a901000000, repr=QMix)
  File "python\ray\_raylet.pyx", line 1424, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1364, in ray._raylet.execute_task.function_executor
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\tune\trainable\trainable.py", line 375, in train 
    raise skipped from exception_cause(skipped)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\tune\trainable\trainable.py", line 372, in train 
    result = self.step()
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\algorithms\algorithm.py", line 851, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\algorithms\algorithm.py", line 2835, in _run_one_training_iteration
    results = self.training_step()
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\util\tracing\tracing_helper.py", line 464, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\algorithms\qmix\qmix.py", line 275, in training_step
    new_sample_batches = synchronous_parallel_sample(
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\execution\rollout_ops.py", line 82, in synchronous_parallel_sample
    sample_batches = [worker_set.local_worker().sample()]
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\rollout_worker.py", line 696, in sample
    batches = [self.input_reader.next()]
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\sampler.py", line 92, in next   
    batches = [self.get_data()]
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 344, in run
    outputs = self.step()
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 382, in step
    eval_results = self._do_policy_eval(to_eval=to_eval)
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 1081, in _do_policy_eval
    eval_results[policy_id] = policy.compute_actions_from_input_dict(
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\algorithms\qmix\qmix_policy.py", line 319, in compute_actions_from_input_dict
    action_distribution=TorchCategorical(masked_q_values_folded),
  File "C:\Users\grhen\AppData\Roaming\Python\Python39\site-packages\ray\rllib\models\torch\torch_action_dist.py", line 91, in __init__
    self.dist = torch.distributions.categorical.Categorical(logits=self.inputs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (4, 2)) of distribution Categorical(logits: torch.Size([4, 2])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]])

This occurs not always in the same time, but I can not finish the training because this error. I tried simple and tune training and with both get the same error.

Someone have some idea to fix it? Thanks :slight_smile:

1 Like

A warning that appears always in the running and near of the error is the following:

WARNING syncer.py:586 -- Last sync command failed: Sync process failed: GetFileInfo() yielded path 'C:/Users/grhen/ray_results/QMIX/QMIX_EPEnv_13613_00000_0_2023-08-25_11-31-27', which is outside base dir 'C:\Users\grhen\ray_results\QMIX'

Hi again. More about the error. When I try with IMPALA everything goes well, but when I try with QMIX the error persist. A complete file of the error is bellow.

Failure # 1 (occurred at 2023-08-30_16-52-34)
e[36mray::QMix.train()e[39m (pid=2920, ip=127.0.0.1, actor_id=b144b72d567387bd71c77da501000000, repr=QMix)
  File "python\ray\_raylet.pyx", line 1613, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1553, in ray._raylet.execute_task.function_executor
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\util\tracing\tracing_helper.py", line 470, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\tune\trainable\trainable.py", line 400, in train
    raise skipped from exception_cause(skipped)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\tune\trainable\trainable.py", line 397, in train
    result = self.step()
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\util\tracing\tracing_helper.py", line 470, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 853, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\util\tracing\tracing_helper.py", line 470, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2836, in _run_one_training_iteration
    results = self.training_step()
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\util\tracing\tracing_helper.py", line 470, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\algorithms\qmix\qmix.py", line 275, in training_step
    new_sample_batches = synchronous_parallel_sample(
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\execution\rollout_ops.py", line 82, in synchronous_parallel_sample
    sample_batches = [worker_set.local_worker().sample()]
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 696, in sample
    batches = [self.input_reader.next()]
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\sampler.py", line 92, in next
    batches = [self.get_data()]
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 344, in run
    outputs = self.step()
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 382, in step
    eval_results = self._do_policy_eval(to_eval=to_eval)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\evaluation\env_runner_v2.py", line 1082, in _do_policy_eval
    eval_results[policy_id] = policy.compute_actions_from_input_dict(
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\algorithms\qmix\qmix_policy.py", line 319, in compute_actions_from_input_dict
    action_distribution=TorchCategorical(masked_q_values_folded),
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\ray\rllib\models\torch\torch_action_dist.py", line 73, in __init__
    self.dist = torch.distributions.categorical.Categorical(logits=self.inputs)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\torch\distributions\categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "C:\Users\grhen\anaconda3\envs\ep_rllib261\lib\site-packages\torch\distributions\distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (4, 2)) of distribution Categorical(logits: torch.Size([4, 2])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan],
        [nan, nan],
        [nan, nan],
        [nan, nan]])

Is it something in my code or an error in the QMIX develop?
Thanks again :slight_smile:

@hermmanhender I am getting the same error with PPO using a custom environment. At first I thought that my environment was producing nans in the observations or reward, so I added wrappers that override nans with real values (which turned out to be unnecessary). I still have not resolved the issue. Have you figured out a solution?

Failure # 1 (occurred at 2024-01-10_13-33-20)
e[36mray::PPO.train()e[39m (pid=139178, ip=10.128.8.12, actor_id=00492a709e8aec376cdf9f9c01000000, repr=PPO)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 389, in train
    raise skipped from exception_cause(skipped)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    result = self.step()
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 803, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2853, in _run_one_training_iteration
    results = self.training_step()
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 403, in training_step
    train_batch = synchronous_parallel_sample(
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 85, in synchronous_parallel_sample
    sample_batches = worker_set.foreach_worker(
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 722, in foreach_worker
    handle_remote_call_result_errors(remote_results, self._ignore_worker_failures)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 75, in handle_remote_call_result_errors
    raise r.get()
ray.exceptions.RayTaskError(ValueError): e[36mray::RolloutWorker.apply()e[39m (pid=140005, ip=10.128.8.12, actor_id=095d49ad9a15390d379caf0701000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x2abfb4b53880>)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 185, in apply
    raise e
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/utils/actor_manager.py", line 176, in apply
    return func(self, *args, **kwargs)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/execution/rollout_ops.py", line 86, in <lambda>
    lambda w: w.sample(), local_worker=False, healthy_only=True
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/rollout_worker.py", line 915, in sample
    batches = [self.input_reader.next()]
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/sampler.py", line 92, in next
    batches = [self.get_data()]
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/sampler.py", line 277, in get_data
    item = next(self._env_runner)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 323, in run
    outputs = self.step()
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 361, in step
    eval_results = self._do_policy_eval(to_eval=to_eval)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/evaluation/env_runner_v2.py", line 1053, in _do_policy_eval
    eval_results[policy_id] = policy.compute_actions_from_input_dict(
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 526, in compute_actions_from_input_dict
    return self._compute_action_helper(
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1171, in _compute_action_helper
    action_dist = dist_class(dist_inputs, self.model)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 114, in __init__
    self.cats = [
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 115, in <listcomp>
    torch.distributions.categorical.Categorical(logits=input_)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/torch/distributions/categorical.py", line 66, in __init__
    super(Categorical, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/dylanrpenn/.conda/envs/punch/lib/python3.10/site-packages/torch/distributions/distribution.py", line 56, in __init__
    raise ValueError(
ValueError: Expected parameter logits (Tensor of shape (1, 3)) of distribution Categorical(logits: torch.Size([1, 3])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values:
tensor([[nan, nan, nan]])

Hi @dylan906 ! I worked debugging my code several hours and find that a library that I was using gave me NaN values some times. When I changed this library the error disappeared.

Maybe you must to debug deeply in every element of your observation space. Also I don’t think that the problem is in the reward function.

I’m not using QMix anymore because it had a lot of problems in the implementation and was only experimental in RLlib (I think that now was moved to outside at rllib_contrib), so it didn’t receive maintenance. Are you using it well now? I’m interested on implement it in my project :slight_smile:

@hermmanhender thanks for the feedback. I’ve been digging more into my observation space and agree with you that the error is somewhere in there (as opposed to within PPO).

One subtlety I have discovered is that even if an environment does not produce NaN values during regular operation, the training algorithm does a bunch of initialization operations that require pinging the environment for observations. In some cases (I’m not entirely sure of how it happens) the environment runs through different logic at initialization than when training to generate observations. This is the time (during training initialization) that my observations have NaN values in them. So I’m looking deeply into my environment’s initialization code. I still may end up with problems during regular training, but I haven’t gotten that far to know yet.

I have not used QMix, so unfortunately I can’t offer you any advice on that subject.

You are welcome. I used threading to be able to synchronize the execution of my custom environment and the standard environment model based on the reset(), step() methods, like those in Gymnasium.

One way you can try is to add a wait trheading event before ending the reset method so that it waits for a watch to be available from the environment. That worked for me.

I don’t know what your case is specifically, but maybe that can help you solve it.