How to add an extra model to a built-in Policy / Trainer?

emasquil · June 23, 2022, 1:03pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, I’m working with a multi-agent goal conditioned environment. I’m using independent PPO policies for each agent.

I want to add another model to each policy (just a linear model in torch), and call it at the start of each episode to select the goal of the agent based on the output of this model. I also need to implement a custom loss to update this new model, based on the episode reward.

I’ve been reviewing the implementation of torch policies and PPO in torch, however, is not clear to me what changes should I do. For the forward pass (goal selection) I was thinking of calling the model in a custom callback in the on_episode_start method. I’d also add the output of the model to the info, so it’s available later for the backward pass. Does that makes sense?
I don’t have a clear idea on where to compute the loss and update the model.

arturn · June 27, 2022, 3:22pm

Hi @emasquil ,

In order to keep track of all variables and “simply” conform with the optimization steps performed by RLlib, you should call the model in a on_episode_start(), which you can simply overwrite with your Policy if your are using RLlib’s new subclassing scheme (available on master. If you are using an older release, you will have to update the policy with it). Your policy can then save the model output and add your custom loss by overwriting the loss() method of your Policy.

Best

Topic		Replies	Views
Passing custom policy multi-agent RLlib	3	849	December 28, 2021
How to use Custom Model in MultiAgent PPO Policy RLlib	3	1258	August 9, 2023
Best ways to customize a PPO algorithm variant in Ray2.8.0 RLlib	1	153	April 29, 2024
Proper way to implement a custom Algorithm + Policy + Model RLlib	2	1050	April 24, 2023
NN model for RLLib A3CPolicy RLlib	1	403	July 23, 2021

How to add an extra model to a built-in Policy / Trainer?

Related topics