How severe does this issue affect your experience of using Ray?
Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I am using APPO with continuous action space and it works ok as long as I zero out the
entropy_coeff. However, it the agent mean episode rewards often crash. Making the learning rate very small is an option, but it slows down the learning process. I noticed that this drop in performance coincides with an abrupt change in var_IS. I wonder what is var_IS? How can I control it? See plots below. The yellow experiment has a lower learning rate of 1e-5, while the other two have learning rate of 5e-5.