Custom RLmodule

Ray version: 2.40.0
Python: 3.11
OS: wsl2 ubuntu 24.04.1

Hello everyone,

Im trying to implement a custom RLmodule to use within a PPO. I managed to put together a class that subclasses TorchRLModule and ValueFunctionAPI, and it has working setup, _forward_exploration, _forward_inference, _forward_train, and compute_values -methods.

Now, my approach is to try to replicate the default PPO RLModule, and then continue customization from there, but the custom and default RLModules seem to learn differently.

What I observe is that the custom RLModule converges faster and to a much lower level of reward than the default module. How would I write a custom module that can be plugged in to PPOConfig.rl_module() and that would behave exactly as the default module?

Hey @termpu,

I also have been running into this problem specifically with PPO. Have you been able to track it down? I have created a few examples / PRs on the RLLIB GitHub, but I am finding that I cannot get performance from my custom models that I could from the old stack or the custom module mainly performs worse than the default RLModule.

Thanks,

Tyler

Hi @tlaurie99!

No luck so far! I tried to implement a custom ActorCriticEncoder and a CustomPPOCatalog in addition to the actual RLModule, but that I didn’t even manage to get to train. I didn’t put much effort into it, the code is mostly LLM-made. Another thing I tried was to upgrade to ray 2.44, but that didn’t help either. Based on a small number of tests, I got the feeling that the default PPO with 2.44 was doing worse compared to 2.40, I don’t know if this is true, but I reverted back to 2.40 after.

I kind of have an offline problem, but it has online characteristics, so what I’m actually trying to do is to add some dropout layers within the PPO nn.

I decided putting this aside for a while. I’ll stick to tuning the default PPO and it’s configs.