Restore from s3 bucket without resume

felipeeeantunes · December 10, 2020, 7:41pm

Hello everyone, I’m trying to do incremental training (more iterations) using a previously trained model’s checkpoint. The checkpoint is stored in an s3 bucket, and I would like to restore it without resume to avoid instant termination as suggested here. When I try to do that, I’m getting the following error: FileNotFoundError: [Errno Path does not exist]

Is it a bug? There is a way to do that?

rliaw · April 29, 2021, 3:14am

Hmm, this does seem like a bug (sorry for the slow reply).

Could you file an issue on github?

Topic		Replies	Views
Unable to restore fully trained checkpoint RLlib	19	2334	October 21, 2023
[Rllib] how to restore trainer from different checkpoint files when training on server and local RLlib	1	218	February 3, 2023
Restore checkpoint saved with client-server RLlib	7	648	August 2, 2022
Error when loading and restoring a trained algorithm from a checkpoint using a APPO Algorithm RLlib	1	244	February 14, 2023
Error in restoring the saved model RLlib	1	353	February 15, 2023

Restore from s3 bucket without resume

Related Topics