saeid93
1
How can I tune the PPO and other policy gradient family algorithms exploration-explotation behavior? which parameters should I use to fine-tune that?
arturn
2
Hi @saeid93 ,
Give this a read!
The parameters you are looking for are entropy_coeff
and entropy_coeff_schedule
.
Cheers
1 Like