What is var_IS in APPO?

aljubrmj · February 16, 2023, 3:30pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am using APPO with continuous action space and it works ok as long as I zero out the
entropy_coeff. However, it the agent mean episode rewards often crash. Making the learning rate very small is an option, but it slows down the learning process. I noticed that this drop in performance coincides with an abrupt change in var_IS. I wonder what is var_IS? How can I control it? See plots below. The yellow experiment has a lower learning rate of 1e-5, while the other two have learning rate of 5e-5.

mannyv · February 16, 2023, 3:50pm

Hi @aljubrmj,

It is the varianceof the importance sampling ratio used by vtrace.

github.com

ray-project/ray/blob/e682939239bb6d2bc6105da5a8236ae0f30afded/rllib/algorithms/appo/appo_torch_policy.py#L236


      
          
          
actions_logp = _make_time_major(
              action_dist.logp(actions), drop_last=drop_last
          )
          prev_actions_logp = _make_time_major(
              prev_action_dist.logp(actions), drop_last=drop_last
          )
          old_policy_actions_logp = _make_time_major(
              old_policy_action_dist.logp(actions), drop_last=drop_last
          )
          is_ratio = torch.clamp(
              torch.exp(prev_actions_logp - old_policy_actions_logp), 0.0, 2.0
          )
          logp_ratio = is_ratio * torch.exp(actions_logp - prev_actions_logp)
          self._is_ratio = is_ratio
          
          
advantages = vtrace_returns.pg_advantages.to(logp_ratio.device)
          surrogate_loss = torch.min(
              advantages * logp_ratio,
              advantages
              * torch.clamp(

aljubrmj · February 16, 2023, 6:44pm

Thank you! How do you suggest I tune the APPO params to alleviate this problem of shooting importance sampling variance?

mannyv · February 16, 2023, 7:42pm

You could try running without vtrace.

aljubrmj · February 17, 2023, 5:29am

I will experiment with it. Thanks a lot!

Topic		Replies	Views
Importance sampling in appo RLlib	0	225	June 28, 2022
Ray tune with multi-agent APPO Configure Algorithm, Training, Evaluation, Scaling	4	312	February 27, 2025
Multi-agent APPO with variable agent numbers and horizon RLlib	0	304	April 4, 2022
Increasing/decreasing exploration in rllib impala algorithm RLlib	1	242	April 13, 2023
APPO/IMPALA Logging RLlib	0	425	March 15, 2021

What is var_IS in APPO?

Related topics