Help needed with understanding and using attention model

Hi!

I am relatively new in reinfocment learning space. So far I have tried simple neural networks and LSTM for training agent (PPO) and now I want to try attention based agent. So I have couple of “simple” questions (partly because I also recently learned about transformers as well and don’t grasp everything yet).

When defining model config there are parameters:
“attention_memory_inference” and", “attention_memory_training”
What these parameters do and how they affect the learning? What does the “attention_memory_inference” in this context?

Also if I have trained some agent and want to use it then what is the input for this model? Is it only the timeseries of observations (to get information about states changing in time) like LSTM model?

attention_memory_inference is the number of timesteps to concat (time axis) and feed into the next transformer unit as inference input. The first transformer unit of your policy will receive this number of past observations (plus the current one), instead.

attention_memory_training is the number of timesteps to concat (time axis) and feed into the next transformer unit as training input (plus the actual input sequence of len=max_seq_len). The first transformer unit will receive this number of past observations (plus the input sequence), instead.

So essentially tweaking these parameters will change the number of timesteps that are concatenated and fed through the attention units of your policy, probably because attention is slow, and you don’t need to rely on it as much during inference as you do when you’re training.