Confusing behavior in PPO training loop (train_batch_size, sgd_minibatch_size, num_sgd_iter)

emasquil · July 26, 2022, 10:15am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, again thanks for this amazing project!

I don’t know if this is a bug or if I’m not understanding correctly the docs. When running PPO I’m seeing that the inner training loop (where the loss is computed) is being called an unexpected number of times.

Expected behavior
The expected behavior, which I obtain when I set num_sgd_iter=1, is that the loss will be computed train_batch_size / sgd_minibatch_size * num_sgd_iter = 6 with my hyper-parameters:

  config["train_batch_size"] = 60_000
  config["sgd_minibatch_size"] = 10_000
  config["num_sgd_iter"] = 1

Unexpected behavior
However, if I set a different number of sgd iterations, for example:

  config["train_batch_size"] = 60_000
  config["sgd_minibatch_size"] = 10_000
  config["num_sgd_iter"] = 10

I observe that the inner loop is called 50 times, instead of 60. The same pattern is observed with num_sgd_iter=30: I get 150 calls to the loss function instead of 180.

A simple way to check this, which is what I’m doing, is to increment a counter in the loss method of PPOTorchPolicy.

Please let me know if my understanding of the docs is incorrect or if something is indeed not working as expected.

Thanks in advance!

mannyv · July 27, 2022, 1:01pm

@emasquil,

Interesting finding. Thanks for sharing.
Which version of ray are you using?
Do you get the same numbers if you set simple_optimizer=True in the config?

Topic		Replies	Views
[RLlib] Ray RLlib config parameters for PPO RLlib	8	7537	April 28, 2021
Reproducibility of ray.tune with seeds RLlib	6	3055	July 26, 2022
Controlling inner loop of MAML RLlib	1	268	December 19, 2022
Num_sgd_iter and evaluation_interval Configure Algorithm, Training, Evaluation, Scaling	7	887	November 24, 2022
PPO.train incorrect result RLlib	1	259	May 23, 2023

Confusing behavior in PPO training loop (train_batch_size, sgd_minibatch_size, num_sgd_iter)

Related topics