RL2: Concatenating episodes with done flag

Issue severity

  • High: It blocks me to complete my task.

hi all,

I’m trying to implement the RL2 algorithm ([1611.02779] RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning) in Rllib. I’m new to Rllib and so am trying to work out what classes I should customise.

In particular, RL2 concatenates sequences of episodes into a ‘trial’. RL2 uses a recurrent architecture and takes previous observations, actions, rewards and ‘done’ flags as input. A trial involves a sequence of episodes (typically 2), and the RNN state is not reset during a trial. I would like to implement this behaviour with a model parameter that lets me determine the number of episodes per trial.

I am not sure how to implement this behaviour. For the episode concatenation I think I might want to edit either the sample collector class or the rollout worker. Is this correct and is there a good example of how this can be done (i.e. overwriting the sample collector - not necessarily my particular case)? I haven’t found anything that manipulates these classes directly.

EDIT: I found that dones are automatically included in sample batch so I’ve resolved the second Q.

Your assistance is greatly appreciated!!