Help writing custom torch policies for interactive RL algorithms

bdpooles · July 7, 2022, 6:07pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi, I’m trying to create custom policies for interactive reinforcement learning algorithms. These algorithms factor in human feedback into the RL formulation (i.e., the state tuple includes human feedback). Thus, in order to implement these algorithms I have a few questions regarding how to implement my own RLLlib policies.

Currently it looks like the documentation has contradictory statements about how to implement your own policies. Here a warning says to use sub-classing. At the same time, here shows the following statement.

image724×79 7.91 KB

What is the best way to create your own policy? Using the helper function or sub-classing? If sub-classing is preferred, is there any documentation showing how to do this properly?
Is it possible to add additional information to the batch returned by methods like sample() from the RolloutWorker . As I am working with interactive RL algorithms, human feedback needs to be add to each state tuple and I would like if the output writers automatically saved this information as well.

Topic		Replies	Views
[RLlib] Make it easier to play trained policies RLlib	2	775	June 3, 2021
Proper way to implement a custom Algorithm + Policy + Model RLlib	2	1140	April 24, 2023
How to write a trainable - for tuning a deterministic policy? RLlib	9	974	July 7, 2021
Passing custom policy multi-agent RLlib	3	857	December 28, 2021
Ensemble Learner with rule-based policies RLlib	1	358	January 12, 2022

Help writing custom torch policies for interactive RL algorithms

Related topics