How to do early stopping in case of large kl divergence with ppo

The early stopping optimization measures the mean KL divergence between the target and the current policy of PPO, and stops the policy updates of the current epoch if the mean KL divergence exceeds some preset threshold.

Below shows how stablebaseline 3 does this.

Is there an elegant way to to this with rllib?

Thanks in advance!