Which optimizer does the PPO algorithm use?

cool-RR · January 7, 2023, 8:38pm

Hi,

I’ve spent 20 minutes digging into RLlib’s source code trying to answer the question: “Which optimizer does the PPO algorithm use by default?” Is it Adam, RMSProp, or anything else?

Besides knowing the answer to this question, I’ll be happy if you could tell me where in the code I should have looked to find that out by myself.

Thanks for your help,
Ram Rachum.

mannyv · January 7, 2023, 9:17pm

Adam is the default.

Here is the pointers for torch. Tf is similar.

github.com

ray-project/ray/blob/0c8b59d2d90df0cfe0f17d9feb7ef9b3e5fe53f2/rllib/algorithms/ppo/ppo_torch_policy.py#L36-L41


      
          class PPOTorchPolicy(
              ValueNetworkMixin,
              LearningRateSchedule,
              EntropyCoeffSchedule,
              KLCoeffMixin,
              TorchPolicyV2,

github.com

ray-project/ray/blob/0c8b59d2d90df0cfe0f17d9feb7ef9b3e5fe53f2/rllib/policy/torch_policy_v2.py#L440


      
          def optimizer(
              self,
          ) -> Union[List["torch.optim.Optimizer"], "torch.optim.Optimizer"]:
              """Custom the local PyTorch optimizer(s) to use.
          
          
    Returns:
                  The local PyTorch optimizer(s) to use for this Policy.
              """
              if hasattr(self, "config"):
                  optimizers = [
                      torch.optim.Adam(self.model.parameters(), lr=self.config["lr"])
                  ]
              else:
                  optimizers = [torch.optim.Adam(self.model.parameters())]
              if getattr(self, "exploration", None):
                  optimizers = self.exploration.get_exploration_optimizer(optimizers)
              return optimizers
          
          
def _init_model_and_dist_class(self):
              if is_overridden(self.make_model) and is_overridden(
                  self.make_model_and_action_dist

cool-RR · January 7, 2023, 9:30pm

Thank you for the quick answer Manny!

Topic		Replies	Views
PPOConfig + custom_model = no PPO at all? Configure Algorithm, Training, Evaluation, Scaling	0	258	December 28, 2023
Ray RLLIB PPO does not solve very simple problem Configure Algorithm, Training, Evaluation, Scaling	2	461	November 8, 2023
Reproducibility of training Results on PPO algorithm RLlib	4	470	September 24, 2021
Custom RLmodule Configure Algorithm, Training, Evaluation, Scaling	2	33	May 8, 2025
How to use Custom Model in MultiAgent PPO Policy RLlib	3	1242	August 9, 2023

Which optimizer does the PPO algorithm use?

Related topics