Custom RLmodule

termpu · May 4, 2025, 2:08pm

Ray version: 2.40.0
Python: 3.11
OS: wsl2 ubuntu 24.04.1

Hello everyone,

Im trying to implement a custom RLmodule to use within a PPO. I managed to put together a class that subclasses TorchRLModule and ValueFunctionAPI, and it has working setup, _forward_exploration, _forward_inference, _forward_train, and compute_values -methods.

Now, my approach is to try to replicate the default PPO RLModule, and then continue customization from there, but the custom and default RLModules seem to learn differently.

What I observe is that the custom RLModule converges faster and to a much lower level of reward than the default module. How would I write a custom module that can be plugged in to PPOConfig.rl_module() and that would behave exactly as the default module?

tlaurie99 · May 8, 2025, 5:49pm

Hey @termpu,

I also have been running into this problem specifically with PPO. Have you been able to track it down? I have created a few examples / PRs on the RLLIB GitHub, but I am finding that I cannot get performance from my custom models that I could from the old stack or the custom module mainly performs worse than the default RLModule.

Thanks,

Tyler

termpu · May 8, 2025, 7:20pm

Hi @tlaurie99!

No luck so far! I tried to implement a custom ActorCriticEncoder and a CustomPPOCatalog in addition to the actual RLModule, but that I didn’t even manage to get to train. I didn’t put much effort into it, the code is mostly LLM-made. Another thing I tried was to upgrade to ray 2.44, but that didn’t help either. Based on a small number of tests, I got the feeling that the default PPO with 2.44 was doing worse compared to 2.40, I don’t know if this is true, but I reverted back to 2.40 after.

I kind of have an offline problem, but it has online characteristics, so what I’m actually trying to do is to add some dropout layers within the PPO nn.

I decided putting this aside for a while. I’ll stick to tuning the default PPO and it’s configs.

Topic		Replies	Views
I cant get my custom network to work RLlib	7	101	April 11, 2025
PPOConfig + custom_model = no PPO at all? Configure Algorithm, Training, Evaluation, Scaling	0	253	December 28, 2023
Writing custom RLModule for custom Algorithm RLlib	1	378	November 27, 2023
Any usage case of Using RLModule for 2D input with CNN as encoder? RLlib	0	302	December 1, 2023
PPO+LSTM custom model implementation problem ray2.10.0 Configure Algorithm, Training, Evaluation, Scaling	3	167	May 9, 2024

Custom RLmodule

Related topics