How to save models after Trainer.fit()?

cartml · August 19, 2023, 6:23am

Hi !

I have been able to train a model using

trainer = TensorflowTrainer(...)

and call

result = trainer.fit()

everything works fine, now; when the model finished training, I need to actually load the trained model and serialize it using own code (since we have a model registry)

However, I didn’t find anywhere how to do it, especially in Ray’s official documentation. Those examples usually ends after result = trainer.fit()

If I use

result.checkpoint.load_model(), it failed; saying i need to supply model argument in load_model

if I use

checkpoint = result.checkpoint
checkpoint.to_directory("/tmp/test_checkpoint")
self.model = keras.models.load_model("/tmp/test_checkpoint")

If failed also (saying there is no SavedModel in the folder)

We are using Ray 2.6.1. What is the documented way to get the trained model after calling trainer.fit()?

Thanks

matthewdeng · August 19, 2023, 3:13pm

Hi @cartml,

How are you creating your checkpoint? In general the recommended way to do this would be something like this:

def train_func():
    with tempfile.TemporaryDirectory() as tmpdir:
        # todo: save your checkpoint to tmpdir in whatever format you want
        # example: model.save(tmpdir + "/test_checkpoint")
        checkpoint = Checkpoint.from_directory(tmpdir)
        session.report({...}, checkpoint=checkpoint)

trainer = TensorTrainer(train_func, ...)
result = trainer.fit()

checkpoint.to_directory(some_local_path)
 # todo: load your checkpoint 
 # example: keras.models.load_model(some_local_path + "/test_checkpoint")

The way that you can think about is that:

Within the training function, you are creating a directory that you can store whatever checkpoint data you want.
When you fetch the checkpoint at the end, you can “recreate” that directory locally.

Topic		Replies	Views
Save and reuse Checkpoints in Ray 2.0 version Ray Train	9	1756	November 16, 2022
Saving ray model to tf/pytorch Checkpointing, Restoring	0	298	August 11, 2023
Save model parameters on each checkpoint Ray Tune	21	3366	March 29, 2023
How to make checkpoint by ray.tune.run and load it? RLlib	3	2783	July 7, 2022
Tuning a Keras model - no checkpoints saved Ray Tune	7	1505	March 1, 2023

How to save models after Trainer.fit()?

Related topics