NaN or Inf found in input tensor

Raphael_Meier · October 4, 2023, 6:46pm

Dear all,

I am fairly new with Ray but have come pretty far with my project (which is a hierarchical custom env for portfolio management - consisting of a Meta-Agent/controller and 14 individual agents, each trading one assets which is wrapped by a multiAgentEnv).

However, during the tune process I do get a lot of errors saying “NaN or Inf found in input tensor.” I have added logging/print statement throughout the code to try and pinpoint the issue.

Below you see a snippet from the console prints when I run the tune process in DEBUG mode, it appears that especially “policy_loss” frequently results in inf or -inf.

There are no obvious places in the code where any number could become so large

Do you have any idea how I could approach this to pinpoint the issue?

Thanks a lot for your help, highly appreciated!

console print snippet:
‘UNH.US’: { ‘custom_metrics’: {}, (A2C pid=37596) ‘diff_num_grad_updates_vs_sampler_policy’: 5003, (A2C pid=37596) ‘grad_gnorm’: 0.0, (A2C pid=37596) ‘learner_stats’: { ‘cur_lr’: 9.725530253490433e-05, (A2C pid=37596) ‘entropy_coeff’: 0.0545186772942543, (A2C pid=37596) ‘policy_entropy’: 2876.6592, (A2C pid=37596) ‘policy_loss’: inf, (A2C pid=37596) ‘var_gnorm’: 24.995335, (A2C pid=37596) ‘vf_loss’: 7.32482e-14}, (A2C pid=37596) ‘num_agent_steps_trained’: 32, (A2C pid=37596) ‘num_grad_updates_lifetime’: 5004, (A2C pid=37596) ‘vf_explained_var’: 0.85291904}, (A2C pid=37596) ‘controller_policy’: { ‘custom_metrics’: {}, (A2C pid=37596) ‘diff_num_grad_updates_vs_sampler_policy’: 5003, (A2C pid=37596) ‘grad_gnorm’: 0.0, (A2C pid=37596) ‘learner_stats’: { ‘cur_lr’: 9.725530253490433e-05, (A2C pid=37596) ‘entropy_coeff’: 0.0545186772942543, (A2C pid=37596) ‘policy_entropy’: 37431.78, (A2C pid=37596) ‘policy_loss’: -inf, (A2C pid=37596) ‘var_gnorm’: 32.27999, (A2C pid=37596) ‘vf_loss’: 0.13675079}, (A2C pid=37596) ‘num_agent_steps_trained’: 32, (A2C pid=37596) ‘num_grad_updates_lifetime’: 5004, (A2C pid=37596) ‘vf_explained_var’: 3.08156e-05}} (A2C pid=37596) (A2C pid=37596) 2023-10-04 10:09:25,675 WARNING deprecation.py:50 – DeprecationWarning: ray.rllib.execution.train_ops.train_one_step has been deprecated. This will raise an error in the future! (RolloutWorker pid=37616) 2023-10-04 10:09:26,136 DEBUG json_writer.py:81 – Wrote 108373 bytes to <_io.TextIOWrapper

PhilippWillms · June 15, 2024, 7:57pm

Any update to that topic?

PhilippWillms · July 15, 2024, 9:19pm

@Raphael_Meier : I had a PPO run today where basically at every iteration, the “NaN or Inf found in input tensor” was reported. What helped me was using complete_episodes, as my environment is based on fixed episode length. Truncated episodes do not contribute to the learning experience. Could this be an option in your case?

ZanhaPeng · April 20, 2025, 8:26am

I have encountered the same problem, but changing it to ‘perfecte_episodes’ did not help me solve it. May I know what I need to do to try and solve this problem

Topic		Replies	Views
Nan or Inf issue with ppo and action masking system Configure Algorithm, Training, Evaluation, Scaling	0	21	May 23, 2025
Ray Tune tensor([[nan]]) for HRL (custom MultiAgentEnV)	0	231	September 18, 2023
ray::ImplicitFu: RuntimeError: No best trial found for the given metric: mean_accuracy. This means that no trial has reported this metric, or all values reported for this metric are NaN. To not ignore NaN values, you can set the `filter_nan_and_inf` arg RLlib	0	36	March 14, 2024
Ray tune never done RLlib	0	208	April 5, 2023
RLLIB not working with Tune with sample batch input RLlib	25	2584	October 4, 2022

NaN or Inf found in input tensor

Related topics