How to use own optimizer for custom_loss_model example

When we create a custom model as in the, and in the custom_loss function we can return a tuple, as follows.

    def custom_loss(self, policy_loss, loss_inputs):
        """Calculates a custom loss on top of the given policy_loss(es).

            policy_loss (List[TensorType]): The list of already calculated
                policy losses (as many as there are optimizers).
            loss_inputs (TensorStruct): Struct of np.ndarrays holding the
                entire train batch.

            List[TensorType]: The altered list of policy losses. In case the
                custom loss should have its own optimizer, make sure the
                returned list is one larger than the incoming policy_loss list.
                In case you simply want to mix in the custom loss into the
                already calculated policy losses, return a list of altered
                policy losses (as done in this example below).
        # Get the next batch from our input files.
        batch =

        # Define a secondary loss by building a graph copy with weight sharing.
        obs = restore_original_dimensions(
        logits, _ = self.forward({"obs": obs}, [], None)

        # You can also add self-supervised losses easily by referencing tensors
        # created during _build_layers_v2(). For example, an autoencoder-style
        # loss can be added as follows:
        # ae_loss = squared_diff(
        #     loss_inputs["obs"], Decoder(self.fcnet.last_layer))
        print("FYI: You can also use these tensors: {}, ".format(loss_inputs))

        # Compute the IL loss.
        action_dist = TorchCategorical(logits, self.model_config)
        imitation_loss = torch.mean(-action_dist.logp(
        self.imitation_loss_metric = imitation_loss.item()
        self.policy_loss_metric = np.mean([l.item() for l in policy_loss])

        # Add the imitation loss to each already calculated policy loss term.
        # Alternatively (if custom loss has its own optimizer):
        return policy_loss + [10 * self.imitation_loss]
        #return [loss_ + 10 * imitation_loss for loss_ in policy_loss]

However, in the compute_gradients function in file, there is an assert statement assert len(loss_out) == len(self._optimizers), which would error, since the loss_out is a 2-tuple, and the self._optimizers is a 1-tuple. Is this a bug?

It seems that in the custom_loss_model example, the custom model is updated both by the policy gradient and the self-defined loss (supervised loss using offline dataset). Should we only first update the custom model by the self-defined loss, then fine-tune by the policy or leave the pretrained model unchanged?

Thanks for any suggestion!