Training mean reward vs. evaluation mean rewward

mannyv · November 15, 2022, 9:58pm

If you train with a stochastic policy then you would expect your best performance if you also inferere and evaluate with a stochastic policy. You should keep explore=True.

I am not sure if you have any preprocessors but I think I remember @arturn saying that preprocessors are applied with compute_single_action but not compute_actions.

Topic		Replies	Views
Cannot reproduce training results in evaluation even on same dataset RLlib	1	574	November 20, 2022
Test reward much lower than training reward RLlib	3	505	July 17, 2022
How to correctly apply observation normalization? RLlib	2	1658	November 19, 2022
MeanStdFilter Observation filter also normalizes action mask RLlib	3	1135	December 21, 2022
Meanstd filter weights storage RLlib	0	56	August 14, 2024

Training mean reward vs. evaluation mean rewward

Related topics