Best ways to customize a PPO algorithm variant in Ray2.8.0

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

I browsed several posts about customizing model but I am a bit confused because the API changes a lot. In Ray 1.* we need to define our policy and trainer, but in the latest documentation it seems every thing is wrapped as an Algorithm class.

In my user case, I want to add a 1D conv layer to my PPO to better deal with the sim2real gap, because my 1D observation(Lidar scan) can be noisy in the real environment. Also, in the future I might need to load a pre-trained wights for this 1Dconv layer only, or switch to a 2Dconv with grey-scale image as input, so I want a customized model. What is the best way to do that in the current version of Ray(say 2.8.0)?

Any guidance or example codes would be appreciated!

@zhijunz Great question! And apologies for the many changes we do lately. We move RLlib to a new API stack to provide users with better performance and greater customizability.

To your question: There are mainly three classes to override in an algorithm to infuse custom logic: Catalog (default module building blocks), RLModule (model configuration and/or behavior), Learner (training logic and/or losses). Algorithm (e.g. training step logic)

In your case you want to change the model that is used by the PPO agent and you would need to override the PPORLModule and PPOTorchRLModule (or PPOTfRLModule. Your custom RLModule can then be passed to the AlgorithmConfig.rl_module() method when configuring the PPO algorithm:

PPOConfig()
.rl_module(
       rl_module_spec=SingleAgentRLModuleSpec(
                module_class=MyPPOTorchRLModule,
                model_config_dict={....}, # model specifications to be passed to your module
                catalog_class=PPOCatalog,
                load_state_path="path/to/module/checkpoint/", # used to load weights
       )
)
1 Like