Empty checkpoint files with Tune.run

luzgui · March 29, 2022, 8:55pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hello, I am using tune to perform different trials on hyperparameters with RLlib fin a custom environment
My problem is that I was supposed to save a final checkpoint per trial but although I have a folder inside each trial folder with checkpoints they are not the correct files

Specifically I am getting the following folder structure

*Experiment*

```
   Trial*
```
```
       Checkpoint_-00001*
```
```
           .is_checkpoint*
```
```
           .null_marker*
```
```
           .tune_metadata*
```
```
       params*
```
```
       progress*
```
```
       results*
```

Essentially I was hopping to have a trained agent per trial and select the best agent that I could then restore to perform actions on my environment. From my understanding, the checkpoint_at_end=True was supposed to save these checkpoints

Is there another way to load a trained agent apart from checkpoints?

here is my snippet


def experiment(config):
    iterations = config.pop("train-iterations")
    train_agent = DQNTrainer(config=config)
    checkpoint = None
    train_results = {}
    for i in range(iterations):
        train_results = train_agent.train()
        tune.report(**train_results)
    train_agent.stop()

config["lr"]=tune.grid_search([1e-5, 1e-4])

tuneobject=tune.run(
    experiment,
    config=config,
    local_dir=raylog,
    checkpoint_at_end=True,
    checkpoint_freq=10,
    name='Exp1',
    checkpoint_score_attr="episode_reward_mean")

Thank you

luzgui · March 30, 2022, 1:36pm

I solved this issue using as a trainable just a PPOTrainer or a DQNTrainer instead of the experiment function

Topic		Replies	Views
Ray restore checkpoint in rllib RLlib	6	1646	August 11, 2021
Rllib checkpointing environment in Tune RLlib	1	424	June 2, 2022
Custom checkpoints in RLLIB RLlib	1	195	December 23, 2023
Some questions about checkpoint in RLLib RLlib	1	319	May 23, 2023
Which attributes can be used in `checkpoint_score_attr` when using `tune.run` RLlib	10	1213	April 20, 2022

Empty checkpoint files with Tune.run

Related topics