RL2: Concatenating episodes with done flag

grant_ray · October 21, 2023, 12:50am

Issue severity

High: It blocks me to complete my task.

hi all,

I’m trying to implement the RL2 algorithm ([1611.02779] RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning) in Rllib. I’m new to Rllib and so am trying to work out what classes I should customise.

In particular, RL2 concatenates sequences of episodes into a ‘trial’. RL2 uses a recurrent architecture and takes previous observations, actions, rewards and ‘done’ flags as input. A trial involves a sequence of episodes (typically 2), and the RNN state is not reset during a trial. I would like to implement this behaviour with a model parameter that lets me determine the number of episodes per trial.

I am not sure how to implement this behaviour. For the episode concatenation I think I might want to edit either the sample collector class or the rollout worker. Is this correct and is there a good example of how this can be done (i.e. overwriting the sample collector - not necessarily my particular case)? I haven’t found anything that manipulates these classes directly.

EDIT: I found that dones are automatically included in sample batch so I’ve resolved the second Q.

Your assistance is greatly appreciated!!
Thanks,
Grant

Topic		Replies	Views
Custom RLModule with LSTM fails to concat episodes of varying lengths together (internal ray error) RLlib	3	45	March 11, 2025
Bad inference after perfect training. What am I missing? RLlib	3	744	June 8, 2022
[Tune] [RLlib] Episodes vs iterations vs trials vs experiments RLlib	1	2297	June 3, 2021
Can you specify workers in rllib algorithm to each collect the same number of episodes? Or each a specific number? RLlib	1	26	September 13, 2024
My RLlib implementation seems to compute random actions RLlib	4	899	February 15, 2022

RL2: Concatenating episodes with done flag

Related topics