Help writing custom torch policies for interactive RL algorithms

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hi, I’m trying to create custom policies for interactive reinforcement learning algorithms. These algorithms factor in human feedback into the RL formulation (i.e., the state tuple includes human feedback). Thus, in order to implement these algorithms I have a few questions regarding how to implement my own RLLlib policies.

  • Currently it looks like the documentation has contradictory statements about how to implement your own policies. Here a warning says to use sub-classing. At the same time, here shows the following statement.


    What is the best way to create your own policy? Using the helper function or sub-classing? If sub-classing is preferred, is there any documentation showing how to do this properly?

  • Is it possible to add additional information to the batch returned by methods like sample() from the RolloutWorker . As I am working with interactive RL algorithms, human feedback needs to be add to each state tuple and I would like if the output writers automatically saved this information as well.