How to add an extra model to a built-in Policy / Trainer?

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hi, I’m working with a multi-agent goal conditioned environment. I’m using independent PPO policies for each agent.

I want to add another model to each policy (just a linear model in torch), and call it at the start of each episode to select the goal of the agent based on the output of this model. I also need to implement a custom loss to update this new model, based on the episode reward.

I’ve been reviewing the implementation of torch policies and PPO in torch, however, is not clear to me what changes should I do. For the forward pass (goal selection) I was thinking of calling the model in a custom callback in the on_episode_start method. I’d also add the output of the model to the info, so it’s available later for the backward pass. Does that makes sense?
I don’t have a clear idea on where to compute the loss and update the model.

Hi @emasquil ,

In order to keep track of all variables and “simply” conform with the optimization steps performed by RLlib, you should call the model in a on_episode_start(), which you can simply overwrite with your Policy if your are using RLlib’s new subclassing scheme (available on master. If you are using an older release, you will have to update the policy with it). Your policy can then save the model output and add your custom loss by overwriting the loss() method of your Policy.