Hello everyone, as the title suggests, I’m trying to understand how these two parameters work for any off-policy algorithms such as QMIX. I have read a few posts and the doc but I still have difficulties fully understand the usage.
From my experiences of running QMIX on my custom Gym environment yesterday, I think
buffer_size is the number of “iterations” collected by the workers that will be stored in the buffer. For example, if I have 15 workers and they collectively sample 15 episodes, this is counted as 1 iteration, and all of them will be stored in the buffer (given that I have set the mode to
complete_episode). I knew this because I saw my ram utilization percentage flatten at the iteration number that is equal to my buffer_size.
Now comes the
train_batch_size, which I believe is the number of
steps from each worker’s episode that will be used for training. For example, if the mean episode length is 2300 and I set the
train_batch_size equals 1000, and I have 15 workers worked for 1 iteration at a time, therefore the actual training batch will be the concatenated batch of the length 15*1000.
But what if I set the mode to
truncated_episode and set the fragment length to 500? Will it automatically downsize the
train_batch_size to be equal to the fragment length?
Thanks in advance.