What does the PPO attention layer do?

I am experimenting with ppo attention layer, and i am trying to understand what exactly it is supposed to do. I assumed that it was just using a transformer to better handle inputs where position of inputs may be relevant, but when i enabled it i saw no performance changes, except that the models with attention where taking longer to train.

The documentation does not really explain what exactly happens when attention is enabled. Is my understanding of what the layer supposed to do wrong?