Best ways to customize a PPO algorithm variant in Ray2.8.0

zhijunz · April 27, 2024, 4:23pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I browsed several posts about customizing model but I am a bit confused because the API changes a lot. In Ray 1.* we need to define our policy and trainer, but in the latest documentation it seems every thing is wrapped as an Algorithm class.

In my user case, I want to add a 1D conv layer to my PPO to better deal with the sim2real gap, because my 1D observation(Lidar scan) can be noisy in the real environment. Also, in the future I might need to load a pre-trained wights for this 1Dconv layer only, or switch to a 2Dconv with grey-scale image as input, so I want a customized model. What is the best way to do that in the current version of Ray(say 2.8.0)?

Any guidance or example codes would be appreciated!

Lars_Simon_Zehnder · April 29, 2024, 2:53pm

@zhijunz Great question! And apologies for the many changes we do lately. We move RLlib to a new API stack to provide users with better performance and greater customizability.

To your question: There are mainly three classes to override in an algorithm to infuse custom logic: Catalog (default module building blocks), RLModule (model configuration and/or behavior), Learner (training logic and/or losses). Algorithm (e.g. training step logic)

In your case you want to change the model that is used by the PPO agent and you would need to override the PPORLModule and PPOTorchRLModule (or PPOTfRLModule. Your custom RLModule can then be passed to the AlgorithmConfig.rl_module() method when configuring the PPO algorithm:

PPOConfig()
.rl_module(
       rl_module_spec=SingleAgentRLModuleSpec(
                module_class=MyPPOTorchRLModule,
                model_config_dict={....}, # model specifications to be passed to your module
                catalog_class=PPOCatalog,
                load_state_path="path/to/module/checkpoint/", # used to load weights
       )
)

Topic		Replies	Views
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	53	February 19, 2025
Custom RLmodule Configure Algorithm, Training, Evaluation, Scaling	2	38	May 8, 2025
PPOConfig + custom_model = no PPO at all? Configure Algorithm, Training, Evaluation, Scaling	0	263	December 28, 2023
Writing custom RLModule for custom Algorithm RLlib	1	403	November 27, 2023
Unable to specify custom model in PPOConfig Configure Algorithm, Training, Evaluation, Scaling	1	59	September 8, 2024

Best ways to customize a PPO algorithm variant in Ray2.8.0

Related topics