I’m having some problems where specifying CheckpointConfig
doesn’t seem to have the expected outcome.
Here are some info about my setup and what I’m trying to do:
- I’m using
ray==2.8.1
. - I’m running
ray.tune
experiments and I’m running each trials (2 trials) for a number of 6 iterations (max_t=6
). - When calling
tune.run
, I’m specifying the followingcheckpoint_config
:
checkpoint_config = ray.air.config.CheckpointConfig(
num_to_keep=1,
checkpoint_score_attribute="eval_metric", # Validated in my code
checkpoint_score_order="min",
)
When I run my tuning pipeline, I can see the ~/ray_results/
directory containing the following checkpoints for each of my trials:
I’d like to keep only the content of checkpoint_XXXXXX
directory containing the best checkpoint and not the content of the checkpoints
directory because I need to sync this to S3 using storage_path
.
Storing all of those checkpoints and it’s taking quite a bit of space (each checkpoint being close to 1Gb). I thought I could achieve this using the checkpoint config object but somehow it’s not working properly.
What am I missing ?