Proper way to implement a custom Algorithm + Policy + Model

Hi everyone,
thanks in advance for any kind help you’ll be able to give me.

I recently started using RLlib but I’m having a few issues with understanding how to properly implement a custom Algorithm, Policy and Model.
Despite having read the relevant docs multiple times, I still find them to be quite incomplete / superficial for a newcomer.

I’m having two main issues:

  • figuring out exactly what i need to implement in terms of sub-classes and overridden methods, especially for Models. Which methods a meant to be overridden and which are meant to be directly use the superclass implementations?;
  • what is the relationship between Model and Policy classes: from my understanding a Policy should need an underlying NN model in order to be able to estimate Q-values (assuming a Q-learning-based algorithm), but is unclear to me how this relationship should be reflected by my custom code. Why are Model implementations necessary? Couldn’t I simply create a TF model directly and use it in my Policy code?

I know my questions might be a bit unclear: if so, I apologize. As you can see, I still have a lot of confusion that I need to clear out in order to properly use RLlib.
I hope anyone might help me in figuring it out!

PS: In case this might be relevant in any way, I am currently trying to implement the LRM algorithm proposed by Icarte et al. in their paper: Learning Reward Machines for Partially Observable Reinforcement Learning

I would recommend to follow the path and read through the RlLib User Guides at the official documentation. I did it the same way when I was lost after many failed trials. Here I learnt how to differentiate in using ray.train() and ray.tune().

Hi Phillips,

thanks for your answer.
Unluckily, I have already tried reading the user guides and my problem stems exactly from the fact that they did not help me in clearing the aforementioned doubts!