Backpropagate from observation to action of previous model call

Sertingolix · April 12, 2021, 3:54pm

Hi there

Currently I’m researching on a MARL environment with cooperation using torch. For this i want to use the action of a model call as input (or a part there of) in a following model call. This could be implemented by using the Trajectory View API. This part of the action does not influence the reward. Because agents cooperate it can benefit from back-propagating the gradient using torch.autograd over more than one model call (action=observation). I would not mind vanishing gradients but would clip them to avoid exploding ones. This could result in rather deep models, but for training I dont mind limiting the horizon. If I’m not mistaken, tensors by default are converted to numpy arrays. Can i avoid this conversion?

Is something like this possible? Do you have an idea, on how one could implement such behavior?

Thank you for answering. Partial answers are appreciated as well. I’m happy to have a discussion.

Edit: Added more details

sven1977 · April 14, 2021, 9:37am

Hey @Sertingolix , nice question. This is a tough one. Yes, we usually convert actions directly into numpys (grads are gone). But maybe you could make your model always store the n previous action calculation tensors. You would probably want to detach these at some timesteps back so you backprop always the same number of timesteps. Then you could use these stored tensors for the next action computation.

Sertingolix · April 14, 2021, 10:32am

Thank you for answering. That seems like a good idea. I’ll try this approach. I’ll post some process when i have something. May take some time as I’ll try an auxiliary loss approach first.

You mentioned to “backprop always the same number of timesteps”. What is the reasoning behind it? Is it for faster training time, vanished gradients without use or something which I’m not aware of?

Topic		Replies	Views
RNN L2 weights regularization RLlib	41	2214	July 5, 2021
There was an error changing the trajecy_tory_view_api into continuous action space RLlib	7	610	February 22, 2022
Compute_actions for Trajectory API RLlib	11	2462	February 10, 2022
Tf modelv3 cartpole example and use of previous actions RLlib	1	307	December 28, 2021
How PPOTrainer export compute_action function? RLlib	1	251	July 14, 2021

Backpropagate from observation to action of previous model call

Related topics