Hi, everybody!

I have a question with respect to the weights of the policy(actor) network.

I am implementing Inverse RL like the below figure.

To Implement Inverse RL, I am using Pytorch and loss function I implemented.

Then I got a problem. SUMO-RAY system used to numpy() type state and action.

So, I have to implement the backward() function.

How to get the Weights of the Policy(actor) network when I used to PPO algorithm.