How to use own optimizer for custom_loss_model example

Shanchao_Yang · April 12, 2021, 2:10am

When we create a custom model as in the custom_loss_model.py, and in the custom_loss function we can return a tuple, as follows.

    @override(ModelV2)
    def custom_loss(self, policy_loss, loss_inputs):
        """Calculates a custom loss on top of the given policy_loss(es).

        Args:
            policy_loss (List[TensorType]): The list of already calculated
                policy losses (as many as there are optimizers).
            loss_inputs (TensorStruct): Struct of np.ndarrays holding the
                entire train batch.

        Returns:
            List[TensorType]: The altered list of policy losses. In case the
                custom loss should have its own optimizer, make sure the
                returned list is one larger than the incoming policy_loss list.
                In case you simply want to mix in the custom loss into the
                already calculated policy losses, return a list of altered
                policy losses (as done in this example below).
        """
        # Get the next batch from our input files.
        batch = self.reader.next()

        # Define a secondary loss by building a graph copy with weight sharing.
        obs = restore_original_dimensions(
            torch.from_numpy(batch["obs"]).float().to(policy_loss[0].device),
            self.obs_space,
            tensorlib="torch")
        logits, _ = self.forward({"obs": obs}, [], None)

        # You can also add self-supervised losses easily by referencing tensors
        # created during _build_layers_v2(). For example, an autoencoder-style
        # loss can be added as follows:
        # ae_loss = squared_diff(
        #     loss_inputs["obs"], Decoder(self.fcnet.last_layer))
        print("FYI: You can also use these tensors: {}, ".format(loss_inputs))

        # Compute the IL loss.
        action_dist = TorchCategorical(logits, self.model_config)
        imitation_loss = torch.mean(-action_dist.logp(
            torch.from_numpy(batch["actions"]).to(policy_loss[0].device)))
        self.imitation_loss_metric = imitation_loss.item()
        self.policy_loss_metric = np.mean([l.item() for l in policy_loss])

        # Add the imitation loss to each already calculated policy loss term.
        # Alternatively (if custom loss has its own optimizer):
        return policy_loss + [10 * self.imitation_loss]
        #return [loss_ + 10 * imitation_loss for loss_ in policy_loss]

However, in the compute_gradients function in torch_policy.py file, there is an assert statement assert len(loss_out) == len(self._optimizers), which would error, since the loss_out is a 2-tuple, and the self._optimizers is a 1-tuple. Is this a bug?

It seems that in the custom_loss_model example, the custom model is updated both by the policy gradient and the self-defined loss (supervised loss using offline dataset). Should we only first update the custom model by the self-defined loss, then fine-tune by the policy or leave the pretrained model unchanged?

Thanks for any suggestion!

Topic		Replies	Views
Call order for `loss_fn` and `custom_loss` RLlib	1	282	August 12, 2021
Independent gradient update for each loss RLlib	2	319	March 13, 2021
How to directly use the custom_loss_model metric in tensorboard RLlib	2	390	June 11, 2021
Example of custom tensorflow or pytorch neural network RLlib	5	659	December 10, 2021
[RLlib] Pytorch multiple optimizers support RLlib	1	596	January 4, 2023

How to use own optimizer for custom_loss_model example

Related topics