Does KL loss make sense when using action masking in PPO?

Jerome-Cong · October 26, 2022, 9:01pm

Hi, I’m training a custom model with discrete action space using PPO. In my understanding, for the RLlib implementation, both the KL penalty and clipping are used. I apply action masking as is indicated in action masking example, and it seems to work in my environment. However, according to the tensorboard, I saw that the KL became infinity and so did the total loss. I suppose this is due to action masking since it changes the distribution severely.

So my question is: should we only use clip range (i.e. set kl_coeff=0.0) when applying action masking?

Jerome-Cong · October 26, 2022, 10:59pm

About the KL explosion with action masking. In my opinion, the action is sampled according to the re-normalized distribution after masking (say, p_r). But I can not confirm whether the policy gradient is updated according to p_r or the distribution before masking p

ahmedammar · August 1, 2023, 4:06pm

Came across this and interested to know the answer too

Topic		Replies	Views
How tu use PPO agent with env with masked actions? RLlib	3	1515	May 3, 2022
Action masks and loss functions RLlib	1	407	January 25, 2021
Problem with action masking RLlib	7	2218	May 19, 2022
Action masking error RLlib	9	1692	February 6, 2023
Tradeoff between: clipped surrogate objective - adaptive KL-penalty coefficient RLlib	3	796	December 9, 2021

Does KL loss make sense when using action masking in PPO?

Related topics