How to get the trial directory or trial ID of the experiment used for initializing the weights of a mutation generated by PB2 or PopulationBasedTraining

My understanding of PB2 and PopulationBasedTraining is that they use checkpoints from other trials when initializing a model created for a new mutation.

As shown in this guide, one has to use an if session.get_checkpoint(): conditional statement to see if the new trial will use a checkpoint from an earlier trial.

I was wondering if there is a way to get the trial_dir or trial_id of the trial from which the checkpoint is loaded. My training function supports storing and loading checkpoints, and I would like to avoid the overhead of storing the same checkpoints with session.report(). My training function would then look like this:

if session.get_checkpoint():  # or some other flag
    # Assuming session.get_source_trial_dir()
    # or something similar exits
    source_trial_dir = session.get_source_trial_dir()

    # By setting args.checkpoint_path,
    # the training function will load the weights
    # (and optionally, optimizer parameters),
    # from the specified checkpoint.
    args.checkpoint_path = source_trial_dir + "checkpoint.pth.tar"

A hacky solution would look like this:

# When creating a checkpoint, only store its path
if self.args.tune != "":
    checkpoint_path = os.path.join(
        session.get_trial_dir(), self.logdir, "checkpoint.pth.tar"
    )
    checkpoint = Checkpoint.from_dict({"checkpoint_path": checkpoint_path})

    session.report(
        {"loss": loss, "accuracy": top1}, checkpoint=checkpoint
    )
if session.get_checkpoint():
    checkpoint_dict = session.get_checkpoint().to_dict()
    args.checkpoint_path = checkpoint_dict["checkpoint_path"]

Why is that needed?
Let’s say you for trialB, at iteration n, you load from trialA’s checkpoint and add some mutation and start training. When you get to the next checkpoint stage (iteration m) for trialB, you always want to still checkpoint that for

  1. trialB is not exactly as A, since there is some mutation
  2. trialB has already made some non trivial progress (from iteration n to iteration m).