Proper way to implement a custom Algorithm + Policy + Model

lorenzonodari · April 20, 2023, 3:26pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi everyone,
thanks in advance for any kind help you’ll be able to give me.

I recently started using RLlib but I’m having a few issues with understanding how to properly implement a custom Algorithm, Policy and Model.
Despite having read the relevant docs multiple times, I still find them to be quite incomplete / superficial for a newcomer.

I’m having two main issues:

figuring out exactly what i need to implement in terms of sub-classes and overridden methods, especially for Models. Which methods a meant to be overridden and which are meant to be directly use the superclass implementations?;
what is the relationship between Model and Policy classes: from my understanding a Policy should need an underlying NN model in order to be able to estimate Q-values (assuming a Q-learning-based algorithm), but is unclear to me how this relationship should be reflected by my custom code. Why are Model implementations necessary? Couldn’t I simply create a TF model directly and use it in my Policy code?

I know my questions might be a bit unclear: if so, I apologize. As you can see, I still have a lot of confusion that I need to clear out in order to properly use RLlib.
I hope anyone might help me in figuring it out!

PS: In case this might be relevant in any way, I am currently trying to implement the LRM algorithm proposed by Icarte et al. in their paper: Learning Reward Machines for Partially Observable Reinforcement Learning

PhilippWillms · April 21, 2023, 5:33pm

I would recommend to follow the path and read through the RlLib User Guides at the official documentation. I did it the same way when I was lost after many failed trials. Here I learnt how to differentiate in using ray.train() and ray.tune().

lorenzonodari · April 24, 2023, 1:50pm

Hi Phillips,

thanks for your answer.
Unluckily, I have already tried reading the user guides and my problem stems exactly from the fact that they did not help me in clearing the aforementioned doubts!

Topic		Replies	Views
Where to start learning model/policy customization? RLlib	1	506	November 10, 2021
Writing custom RLModule for custom Algorithm RLlib	1	403	November 27, 2023
Custom Algorithm Configure Algorithm, Training, Evaluation, Scaling	1	503	November 30, 2022
Best ways to customize a PPO algorithm variant in Ray2.8.0 RLlib	1	153	April 29, 2024
How to use my pretrained model as policy and value netwok RLlib	6	1213	December 26, 2023

Proper way to implement a custom Algorithm + Policy + Model

Related topics