I have a question with respect to the weights of the policy(actor) network.
I am implementing Inverse RL like the below figure.
To Implement Inverse RL, I am using Pytorch and loss function I implemented.
Then I got a problem. SUMO-RAY system used to numpy() type state and action.
So, I have to implement the backward() function.
How to get the Weights of the Policy(actor) network when I used to PPO algorithm.