Meaning of timers in RLlib PPO

Hi, I have trained a model using PPO and when I see results by iteration (the ones saved in ~/ray_results and that you can visiually represent with tensorboard) there are some metrics I don’t know exactly what they refer to. Is there any place where these metrics values are explianed? Specifically I’m interested in knowing what each timer represents:

  • timers/sample_time_ms,
  • timers/sample_throughput
  • timers/load_time_ms
  • timers/load_throughput
  • timers/learn_time_ms
  • timers/learn_throughput
  • timers/update_time_ms

Thanks in advance

3 Likes

Hi @javigm98 ,

I do not think the metrics are documented, but I had the same question so I hopefully can help finding the answer.

load_time_ms, learn_time_ms and update_time_ms are recorded in the method __call__(...) or ray.rllib.execution.train_ops.TrainTFMultiGPU with these and this timer objects.

If I remember correctly, lean_time_ms is the time to compute the gradients and perform one gradient descent step; load_time_ms is the time to load the sample on the device that will compute the gradients (GPU(s)); update_time_ms is the time to send the new weights of the network to each worker before starting the next iteration.

I don’t remember where sample_time_ms is recorded but it is basically the time for workers to collect enough sample for one iteration.

The _throughput metrics are computed by the timers (see ray.utils.timer._Timer.mean_throughput) as the number of steps loaded or trained per second during the respective operations.

5 Likes

Hi @thomaslecat and thank you so much for your answer. It was really clarifying to me!!

2 Likes

Kind of an iffy question, but what are some good values (in terms of orders of magnitude) for these timers? For example, for a fully connected [64,64] nn, my learn time ms is something like 2000 ms, which feels slow to me. I wasn’t using a gpu, but the network is kind of small, so I’m not sure if my intuition is valid…

Just putting this here because I answered something similar outside the forums.

All metrics are averaged per iteration of an algorithm. That is, they are averaged per reporting of metrics and generally encompass many SGD steps, sampled batches etc.

/timers/learn_throughput:
Number of batches that we have learned on.

/timers/learn_time_ms:
Number of ms it took on average to learn on a single batch. In the case of multiple SGD steps (see num_sgd_iter), this spans all SGD steps over a single batch we learn on. In the case of a multi-agent batch, this also spans all SGD steps that might be taken sequentially for the individual policies.

/timers/load_throughput:
Number of batches we have loaded. That is, we might decompress a batch, pad it and put it onto the device of the policy that is going to train on the batch. This is independent of other preprocessing steps that might occur as part of the algorithm and therefore excludes per-algorithm logic.

/timers/load_time_ms:
Number of ms it took on average for a batch to be loaded. See above for what loading encompasses.

/timers/training_iteration_time_ms:
The number of ms it took for the iteration of the algorithm to be completed. This may include things like sampling, SGD steps, or updates of sampling policies.

/timers/update_time_ms:
The number of ms it took to update the sampling policies of all distributed workers after training the policy/policies.

2 Likes

I was also searching for the meaning of some logged metrics regarding timing and found this thread. Some of the metrics have been explained already (thanks @thomaslecat and @arturn ). Some I had to look up in the source code _PerfStats:

# Raw observation preprocessing.
"mean_raw_obs_processing_ms": self.raw_obs_processing_time * factor,
# Computing actions through policy.
"mean_inference_ms": self.inference_time * factor,
# Processing actions (to be sent to env, e.g. clipping).
"mean_action_processing_ms": self.action_processing_time * factor,
# Waiting for environment (during poll).
"mean_env_wait_ms": self.env_wait_time * factor,
# Environment rendering (False by default).
"mean_env_render_ms": self.env_render_time * factor,

The comments are not super explanatory, but maybe still helpful for some. @arturn Do you know if mean_env_wait_ms includes the environment stepping? Or what happens during the poll? Is there any metric that measures the environment’s speed?

Yes, mean_env_wait_ms is basically a timer for env.send_actions + env.poll; the former actually calls env.step, while the latter just fetches the return values. See ray/rllib/env/vector_env.py at master · ray-project/ray · GitHub for more information.