I am using a custom PPO model with ray.tune(), and I want to add some self-supervised learning that is dependent on batch[‘obs’], batch[‘done’], batch[‘action’] and batch[‘next_obs’]
I have defined some layers in my model that are called only during training.
I have defined a loss function which I am passing to the trainer
LoggedPPO = PPOTFPolicy.with_updates(
within the loss function, i passed various inputs through layers that were never called in the forward model. Specifically, these inputs are train_batch[‘actions’] (things from the observation), and layers that I have stored as attributes of the model (e.g. model.loss_context)
The layers that are not in the forward model (i.e. the ones only called during the loss function) do not seem to be added to the gradient - i am recording their magnitude and they are not changing ,even when I put a clearly simple example which is just a huge weight decay on a layer called outside the forward model.
I have also tried adding these layers to a overridden @custom_loss function, as per the example https://github.com/ray-project/ray/blob/50e1fda022a81e5015978cf723f7b5fd9cc06b2c/rllib/examples/models/custom_loss_model.py: but in this case the weights for those layers do not even initialise.
Has anyone solved this? I see a number of stack overflow questions asking about this but no answers!