Repeating cycles in PPO algorithm

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Dear all, I hope you are doing well. I am doing a research project and I am using PPO algorithm but there is sth weird with the reward curve. there are some cycles every 600k steps and it goes up and down