Save experience from custom policy

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am creating a custom policy and I want to use it for producing some experience replay.

Converting external experiences to batch format is quite limited and old: I am using a multiagent environment with an rnn so I need to save also the state information and it lacks of a postprocessing.

even adapting the script proposed to my case, if I use the batches produced with this method I only obtain .inf as imitation loss, while if I save batches as output of a trining and I use them later for a supervised train loss is meaningful. I suppose there is a problem of format.

is there an automatic way to save some episodes given a custom policy?