Exploration in PPO and policy gradient algorithms

How can I tune the PPO and other policy gradient family algorithms exploration-explotation behavior? which parameters should I use to fine-tune that?

Hi @saeid93 ,

Give this a read!
The parameters you are looking for are entropy_coeff and entropy_coeff_schedule.

Cheers

1 Like