Exploration in PPO and policy gradient algorithms

saeid93 · November 18, 2021, 4:12pm

How can I tune the PPO and other policy gradient family algorithms exploration-explotation behavior? which parameters should I use to fine-tune that?

arturn · November 21, 2021, 9:41pm

Give this a read!
The parameters you are looking for are entropy_coeff and entropy_coeff_schedule.

Cheers

Topic		Replies	Views
Tuning entropy in PPO RLlib	2	3144	April 16, 2021
Practical advice for RLlib hyperparameter tuning RLlib	1	482	September 12, 2022
How to train better Configure Algorithm, Training, Evaluation, Scaling	0	143	March 29, 2024
Raytune. Tuner Param_space reinforcement learning	1	196	January 17, 2024
Parameter noise exploration and policy gradient / actor critic RLlib	0	307	September 27, 2022