Tradeoff between: clipped surrogate objective - adaptive KL-penalty coefficient

andoma · May 20, 2021, 9:30am

Hello,

I am using RLlib to design an adaptive controller. I just have a dumb question. As far as I am concerned, PPO implements either a clipped surrogate objective or an adaptive KL-penalty coefficient. However, when I design my RLlib Agent, I have to provide hyperparameters that concern both methods. I was told that RLlib agents trade off between these two approaches. Is that true? If yes, why and where can I find information/documentation about this?

Thank you in advance,

Antonio

arturn · May 20, 2021, 11:24am

Hi andoma,

The original PPO paper does not make the choice between using the clipped surrogate objective and the adaptive KL-penalty exclusive. RLlib indeed uses both. I do not know of an article that describes this
But you can see for yourself in the code!
You find the clipped surrogate objective in lines 88ff and and the complete expression of the loss in lines 111ff.

If you want PPO to leave out the KL term, have a look at the ppo execution plan.
You can copy it and simply leave out the KL_Update call in line 294!

I hope this helps!

andoma · May 20, 2021, 11:53am

Hello arturn,

Thank you very much for your help! That totally solves my problem.

Antonio

LukasNothhelfer · December 9, 2021, 9:11am

@arturn @andoma Isn’t it also enough to simply set kl_coeff and kl_target to 0.0?
See config.

Topic		Replies	Views
Diffrences between the PPO implementation and the origonal PPO paper RLlib	6	881	May 16, 2021
Breakdown of config and metrics of PPO implementation RLlib	0	678	February 23, 2022
PPO gives "Infinity" value for kl and total_loss RLlib	5	1575	October 1, 2021
~~Possible PPO surrogate policy loss sign error~~ RLlib	2	792	October 4, 2022
How to do early stopping in case of large kl divergence with ppo RLlib	0	1343	August 26, 2021

Tradeoff between: clipped surrogate objective - adaptive KL-penalty coefficient

Related topics