Backpropagating gradients through layers of a model that are only called during the loss function


I am using a custom PPO model with ray.tune(), and I want to add some self-supervised learning that is dependent on batch[‘obs’], batch[‘done’], batch[‘action’] and batch[‘next_obs’]

I have defined some layers in my model that are called only during training.

I have defined a loss function which I am passing to the trainer

LoggedPPO = PPOTFPolicy.with_updates(

within the loss function, i passed various inputs through layers that were never called in the forward model. Specifically, these inputs are train_batch[‘actions’] (things from the observation), and layers that I have stored as attributes of the model (e.g. model.loss_context)

The layers that are not in the forward model (i.e. the ones only called during the loss function) do not seem to be added to the gradient - i am recording their magnitude and they are not changing ,even when I put a clearly simple example which is just a huge weight decay on a layer called outside the forward model.

I have also tried adding these layers to a overridden @custom_loss function, as per the example but in this case the weights for those layers do not even initialise.

Has anyone solved this? I see a number of stack overflow questions asking about this but no answers!

Could you please provide a reproduction script?
That would make it easier to see what exactly is happening and possibly craft an example out of the solution.