How to do early stopping in case of large kl divergence with ppo

Shanchao_Yang · August 26, 2021, 5:02am

The early stopping optimization measures the mean KL divergence between the target and the current policy of PPO, and stops the policy updates of the current epoch if the mean KL divergence exceeds some preset threshold.

Below shows how stablebaseline 3 does this.

github.com

DLR-RM/stable-baselines3/blob/3efab0d267e74cb03264411d4500ddde0c163404/stable_baselines3/ppo/ppo.py#L253-L257

    
      
          if self.target_kl is not None and approx_kl_div > 1.5 * self.target_kl:
              continue_training = False
              if self.verbose >= 1:
                  print(f"Early stopping at step {epoch} due to reaching max kl: {approx_kl_div:.2f}")
              break

Is there an elegant way to to this with rllib?

Thanks in advance!

Topic		Replies	Views
PPO training, kl loss divergence and stability problems RLlib	0	33	March 19, 2025
PPO gives "Infinity" value for kl and total_loss RLlib	5	1548	October 1, 2021
Diffrences between the PPO implementation and the origonal PPO paper RLlib	6	879	May 16, 2021
Breakdown of config and metrics of PPO implementation RLlib	0	674	February 23, 2022
How to train until convergence RLlib	1	602	July 6, 2022

How to do early stopping in case of large kl divergence with ppo

Related topics