Backpropagating gradients through layers of a model that are only called during the loss function

Jacob_Lourie · January 4, 2023, 6:14pm

Hi.

I am using a custom PPO model with ray.tune(), and I want to add some self-supervised learning that is dependent on batch[‘obs’], batch[‘done’], batch[‘action’] and batch[‘next_obs’]

I have defined some layers in my model that are called only during training.

I have defined a loss function which I am passing to the trainer

LoggedPPO = PPOTFPolicy.with_updates(
name=“SHPPOPolicy”,
loss_fn=surrogate_loss,
stats_fn=stats,
)

within the loss function, i passed various inputs through layers that were never called in the forward model. Specifically, these inputs are train_batch[‘actions’] (things from the observation), and layers that I have stored as attributes of the model (e.g. model.loss_context)

The layers that are not in the forward model (i.e. the ones only called during the loss function) do not seem to be added to the gradient - i am recording their magnitude and they are not changing ,even when I put a clearly simple example which is just a huge weight decay on a layer called outside the forward model.

I have also tried adding these layers to a overridden @custom_loss function, as per the example https://github.com/ray-project/ray/blob/50e1fda022a81e5015978cf723f7b5fd9cc06b2c/rllib/examples/models/custom_loss_model.py: but in this case the weights for those layers do not even initialise.

Has anyone solved this? I see a number of stack overflow questions asking about this but no answers!

arturn · February 8, 2023, 7:49pm

Could you please provide a reproduction script?
That would make it easier to see what exactly is happening and possibly craft an example out of the solution.

Topic		Replies	Views
PPO+LSTM custom model implementation problem ray2.10.0 Configure Algorithm, Training, Evaluation, Scaling	3	161	May 9, 2024
How to add an extra model to a built-in Policy / Trainer? RLlib	1	283	June 27, 2022
How to turn training off for hidden layers of default PPO network? RLlib	5	859	March 14, 2022
Callback on_episode_end does not report correct actions Configure Algorithm, Training, Evaluation, Scaling	2	27	February 12, 2025
How to use own optimizer for custom_loss_model example RLlib	0	469	April 12, 2021

Backpropagating gradients through layers of a model that are only called during the loss function

Related topics