Multi-head model functionality

  • High: It blocks me to complete my task.

Hi all, trying to reproduce some of the following paper:

The functionality I need currently is to be able to train a common network with multiple ‘heads’, with each head trained on a different task.

I can implement the multiple tasks using the TaskSettableEnv, but from that point i’m not sure. Two questions:

  1. is it possible to store experiences from different tasks separately in a reply buffer (say for the dqn replay buffer)
  2. if 1 is true, would I need to write a custom loss function to train each tasks data on a different head?

This question was answered during May24 office hours. RLlib office hours: May 24 - YouTube

Suggestion was to use IMPALA algorithm instead of PPO since it has replay buffer.

@renos So if I understand correctly you essentially want to add auxiliary losses to your policy loss to jointly train your visual representation with the policy specific parameters?

RLlib provides a custom_loss() hook that allows such use cases. You can take a look at rllib/examples/ to see how it’s used in practice. Ideally you want to use loss_inputs to get access to the train_batch sampled from the replay buffer. Within that function you can compute reward prediction loss or any other auxiliary losses that would improve the representation.

1 Like