What is var_IS in APPO?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am using APPO with continuous action space and it works ok as long as I zero out the
entropy_coeff. However, it the agent mean episode rewards often crash. Making the learning rate very small is an option, but it slows down the learning process. I noticed that this drop in performance coincides with an abrupt change in var_IS. I wonder what is var_IS? How can I control it? See plots below. The yellow experiment has a lower learning rate of 1e-5, while the other two have learning rate of 5e-5.

Hi @aljubrmj,

It is the varianceof the importance sampling ratio used by vtrace.

Thank you! How do you suggest I tune the APPO params to alleviate this problem of shooting importance sampling variance?

You could try running without vtrace.

I will experiment with it. Thanks a lot!