How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi everyone,
thanks in advance for any kind help you’ll be able to give me.
I recently started using RLlib but I’m having a few issues with understanding how to properly implement a custom Algorithm, Policy and Model.
Despite having read the relevant docs multiple times, I still find them to be quite incomplete / superficial for a newcomer.
I’m having two main issues:
- figuring out exactly what i need to implement in terms of sub-classes and overridden methods, especially for Models. Which methods a meant to be overridden and which are meant to be directly use the superclass implementations?;
- what is the relationship between Model and Policy classes: from my understanding a Policy should need an underlying NN model in order to be able to estimate Q-values (assuming a Q-learning-based algorithm), but is unclear to me how this relationship should be reflected by my custom code. Why are Model implementations necessary? Couldn’t I simply create a TF model directly and use it in my Policy code?
I know my questions might be a bit unclear: if so, I apologize. As you can see, I still have a lot of confusion that I need to clear out in order to properly use RLlib.
I hope anyone might help me in figuring it out!
PS: In case this might be relevant in any way, I am currently trying to implement the LRM algorithm proposed by Icarte et al. in their paper: Learning Reward Machines for Partially Observable Reinforcement Learning