Dear all,
I am fairly new with Ray but have come pretty far with my project (which is a hierarchical custom env for portfolio management - consisting of a Meta-Agent/controller and 14 individual agents, each trading one assets which is wrapped by a multiAgentEnv).
However, during the tune process I do get a lot of errors saying “NaN or Inf found in input tensor.” I have added logging/print statement throughout the code to try and pinpoint the issue.
Below you see a snippet from the console prints when I run the tune process in DEBUG mode, it appears that especially “policy_loss” frequently results in inf or -inf.
There are no obvious places in the code where any number could become so large
Do you have any idea how I could approach this to pinpoint the issue?
Thanks a lot for your help, highly appreciated!
console print snippet:
‘UNH.US’: { ‘custom_metrics’: {}, (A2C pid=37596) ‘diff_num_grad_updates_vs_sampler_policy’: 5003, (A2C pid=37596) ‘grad_gnorm’: 0.0, (A2C pid=37596) ‘learner_stats’: { ‘cur_lr’: 9.725530253490433e-05, (A2C pid=37596) ‘entropy_coeff’: 0.0545186772942543, (A2C pid=37596) ‘policy_entropy’: 2876.6592, (A2C pid=37596) ‘policy_loss’: inf, (A2C pid=37596) ‘var_gnorm’: 24.995335, (A2C pid=37596) ‘vf_loss’: 7.32482e-14}, (A2C pid=37596) ‘num_agent_steps_trained’: 32, (A2C pid=37596) ‘num_grad_updates_lifetime’: 5004, (A2C pid=37596) ‘vf_explained_var’: 0.85291904}, (A2C pid=37596) ‘controller_policy’: { ‘custom_metrics’: {}, (A2C pid=37596) ‘diff_num_grad_updates_vs_sampler_policy’: 5003, (A2C pid=37596) ‘grad_gnorm’: 0.0, (A2C pid=37596) ‘learner_stats’: { ‘cur_lr’: 9.725530253490433e-05, (A2C pid=37596) ‘entropy_coeff’: 0.0545186772942543, (A2C pid=37596) ‘policy_entropy’: 37431.78, (A2C pid=37596) ‘policy_loss’: -inf, (A2C pid=37596) ‘var_gnorm’: 32.27999, (A2C pid=37596) ‘vf_loss’: 0.13675079}, (A2C pid=37596) ‘num_agent_steps_trained’: 32, (A2C pid=37596) ‘num_grad_updates_lifetime’: 5004, (A2C pid=37596) ‘vf_explained_var’: 3.08156e-05}} (A2C pid=37596) (A2C pid=37596) 2023-10-04 10:09:25,675 WARNING deprecation.py:50 – DeprecationWarning: ray.rllib.execution.train_ops.train_one_step
has been deprecated. This will raise an error in the future! (RolloutWorker pid=37616) 2023-10-04 10:09:26,136 DEBUG json_writer.py:81 – Wrote 108373 bytes to <_io.TextIOWrapper