Meaning of timers in RLlib PPO

Hi, I have trained a model using PPO and when I see results by iteration (the ones saved in ~/ray_results and that you can visiually represent with tensorboard) there are some metrics I don’t know exactly what they refer to. Is there any place where these metrics values are explianed? Specifically I’m interested in knowing what each timer represents:

  • timers/sample_time_ms,
  • timers/sample_throughput
  • timers/load_time_ms
  • timers/load_throughput
  • timers/learn_time_ms
  • timers/learn_throughput
  • timers/update_time_ms

Thanks in advance


Hi @javigm98 ,

I do not think the metrics are documented, but I had the same question so I hopefully can help finding the answer.

load_time_ms, learn_time_ms and update_time_ms are recorded in the method __call__(...) or ray.rllib.execution.train_ops.TrainTFMultiGPU with these and this timer objects.

If I remember correctly, lean_time_ms is the time to compute the gradients and perform one gradient descent step; load_time_ms is the time to load the sample on the device that will compute the gradients (GPU(s)); update_time_ms is the time to send the new weights of the network to each worker before starting the next iteration.

I don’t remember where sample_time_ms is recorded but it is basically the time for workers to collect enough sample for one iteration.

The _throughput metrics are computed by the timers (see ray.utils.timer._Timer.mean_throughput) as the number of steps loaded or trained per second during the respective operations.


Hi @thomaslecat and thank you so much for your answer. It was really clarifying to me!!


Kind of an iffy question, but what are some good values (in terms of orders of magnitude) for these timers? For example, for a fully connected [64,64] nn, my learn time ms is something like 2000 ms, which feels slow to me. I wasn’t using a gpu, but the network is kind of small, so I’m not sure if my intuition is valid…