Jump-Start Reinforcement Learning

It looks like the jump-start has helped in that you have a higher win rate and the variance of the win rate appears much lower. Any idea why you the performance degrades over training in both cases?

I feel like a broken record because I point a lot of people in this direction. But there can be some issues training PPO with continuous activations. If you have not seen it you might want to give this post a read to see if you are experiencing this issue.