I apologize if the title isn’t the most accurate, however, I am currently training an agent which could take anywhere from 1000-3000 timesteps. I am using the policy client/server setup with a PPO trainer with
"batch_mode": "truncate_episodes" and
"train_batch_size": 4000. Going with a higher batch_size leads to a OOM memory crash on the server.
My worry/concern: Having 5 agents play the game means that there will a complete training cycle without any episodes being completed. This means that it will train, then once the 5 agents are done their game using the previous model, a new training cycle will begin except it will be based on the results of the previous iteration causing weird results
- Is my concern valid?
- How would
batchmode: truncate episodes vs complete episdoesaffect training?
- If the client was on iteration 1 but the server went through an iteration and is on iteration 2. If the client finishes it episode and pushes, does the server discard that episode since it is based on the old policy? Does this change if the client is halfway through an episode when it gets the new iteration?