How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
Hi, I’m working with a multi-agent goal conditioned environment. I’m using independent PPO policies for each agent.
I want to add another model to each policy (just a linear model in torch), and call it at the start of each episode to select the goal of the agent based on the output of this model. I also need to implement a custom loss to update this new model, based on the episode reward.
I’ve been reviewing the implementation of torch policies and PPO in torch, however, is not clear to me what changes should I do. For the forward pass (goal selection) I was thinking of calling the model in a custom callback in the on_episode_start
method. I’d also add the output of the model to the info, so it’s available later for the backward pass. Does that makes sense?
I don’t have a clear idea on where to compute the loss and update the model.