Warnings while MARWIL training

cloud · June 14, 2022, 11:32am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m trying to execute MARWIL training (offline RL) for a dataset I have recollected using an expert agent and the JsonWriter class, in a similar way to this example: ray/saving_experiences.py at master · ray-project/ray · GitHub

As I’m using a deterministic expert, the parameters should be action_prob=1.0 (action_logp=0.0). The problem is that while I train using MARWIL algorithm (beta=1.0, input_evaluation=['is','wis']), I get the following warnings:

(MARWILTrainer pid=168137) /home/<user>/.local/lib/python3.8/site-packages/ray/rllib/offline/is_estimator.py:25: RuntimeWarning: overflow encountered in double_scalars
(MARWILTrainer pid=168137) p.append(pt_prev * new_prob[t] / old_prob[t])
(MARWILTrainer pid=168137) /home/<user>/.local/lib/python3.8/site-packages/ray/rllib/offline/is_estimator.py:31: RuntimeWarning: invalid value encountered in double_scalars
(MARWILTrainer pid=168137) V_step_IS += p[t] * rewards[t] * self.gamma ** t
(MARWILTrainer pid=168137) /home/<user>/.local/lib/python3.8/site-packages/ray/rllib/offline/wis_estimator.py:31: RuntimeWarning: overflow encountered in double_scalars
(MARWILTrainer pid=168137) p.append(pt_prev * new_prob[t] / old_prob[t])
(MARWILTrainer pid=168137) /home/<user>/.local/lib/python3.8/site-packages/ray/rllib/offline/wis_estimator.py:45: RuntimeWarning: invalid value encountered in double_scalars
(MARWILTrainer pid=168137) V_step_WIS += p[t] / w_t * rewards[t] * self.gamma ** t

From what I understand, MARWIL uses this probability p[t] along with the rewards of the agent to weight and improve over Behaviour Cloning using 'is', 'wis' input evaluation.

Does these warnings mean that the agent can’t learn effectively under my config/conditions/dataset? (maybe it reverts back to Behaviour Cloning?)
Are the warnings due to the action_prob being equal to 1 or are other aspects of my config/training parameters?
How can I fix it?

Just in case it is interesting to note, there shouldn’t be problems with my rewards.

christy · June 16, 2022, 4:03am

Hi there!

Welcome to the RLlib community! Would you like to ask your question in RLlib Office Hours? Just add discuss link to your question to this doc: RLlib Office Hours - Google Docs

Thanks! Hope to see you there!

Topic		Replies	Views
Overflow encountered in reduce RLlib	3	567	October 26, 2023
Offline learning with MARWIL with LSTM RLlib	3	402	December 9, 2021
RLlib tutorial contains deprecated code RLlib	0	480	July 21, 2023
Error when training RL policy using big offline dataset RLlib	5	743	October 7, 2022
Confusion migrating to new API RLlib	5	134	February 21, 2025

Warnings while MARWIL training

Related topics