Empty checkpoint files with Tune.run

How severe does this issue affect your experience of using Ray?

  • High: It blocks me to complete my task.

Hello, I am using tune to perform different trials on hyperparameters with RLlib fin a custom environment
My problem is that I was supposed to save a final checkpoint per trial but although I have a folder inside each trial folder with checkpoints they are not the correct files

Specifically I am getting the following folder structure

*Experiment*
  •    Trial*
    
  •        Checkpoint_-00001*
    
  •            .is_checkpoint*
    
  •            .null_marker*
    
  •            .tune_metadata*
    
  •        params*
    
  •        progress*
    
  •        results*
    

Essentially I was hopping to have a trained agent per trial and select the best agent that I could then restore to perform actions on my environment. From my understanding, the checkpoint_at_end=True was supposed to save these checkpoints

Is there another way to load a trained agent apart from checkpoints?

here is my snippet


def experiment(config):
    iterations = config.pop("train-iterations")
    train_agent = DQNTrainer(config=config)
    checkpoint = None
    train_results = {}
    for i in range(iterations):
        train_results = train_agent.train()
        tune.report(**train_results)
    train_agent.stop()

config["lr"]=tune.grid_search([1e-5, 1e-4])

tuneobject=tune.run(
    experiment,
    config=config,
    local_dir=raylog,
    checkpoint_at_end=True,
    checkpoint_freq=10,
    name='Exp1',
    checkpoint_score_attr="episode_reward_mean")

Thank you

I solved this issue using as a trainable just a PPOTrainer or a DQNTrainer instead of the experiment function