I realized that learners/__all_modules__/num_env_steps_trained
has a strangely large value, multiple times higher than learners/__all_modules__/num_module_steps_trained
.
For example:
I have a batch size and sample size of 2048, train for 20 epochs with a minibatch size of 128. The resulting learners
log is:
__all_modules__: {
num_module_steps_trained: 40960,
num_env_steps_trained: 655360
}
num_module_steps_trained
makes sense to me it is:
num_module_steps_trained = 2048 samples * 20 epochs = 128 minibatch_size * (2048/128 minibatch cycles) * 20 epochs
.
However, num_env_steps_trained
makes no sense to me - it is 16 times higher. It is calculated:
2048 samples * 20 epochs * (2048 / 128 minibatch cycles) = 2048 * 320 iterations total
I assume this is a bug from the logging:
batch.env_steps()
, despite being a minibatch of size 128, returns the full batch_size 2048. So for the 320 iterations 2048 is logged and summed up to 655360.
Shouldn’t this be logged differently and elsewhere, or is there another angle how I could interpret this number?