Load checkpoint from tune experiment when using custom environment not registered

Ciro_NA · September 6, 2023, 5:09pm

High: It blocks me to complete my task.

I’m having some trouble loading a checkpoint generated from a tune experiment. The problem is I introduced the custom environment class without registering it, such as in the example below:

config_PPO = (
    PPOConfig()
    .environment(
        HRROenv,
        env_config={
            'membrane': Membrane().membrane_xus180808_double(),
            'solution': Solution(),
            'design': DesignParameters.Nijhuis_BIA(),
            'operation': OperationParameters()
        }
    )

I’m getting some error message related to eager mode from tensorflow, but it is most certainly because the environment cannot be found. Is there any workaround to load the checkpoint? What is the easiest approach to achieve it?

Lars_Simon_Zehnder · September 7, 2023, 7:32am

Hi @Ciro_NA ,

how do you try to load the checkpoint? WIth Algorithm.load_from_checkpoint()?

Ciro_NA · September 7, 2023, 7:52am

Hi @Lars_Simon_Zehnder,

Probably you meant Algorithm.from_checkpoint(). Yes, I’m using it. I also checked the dict that gathers the info to load all the information. The env key has the value <class ‘hrro_env_norm.HRROenv’>. I tried to register the environment with this name, but it’s not working. The ray version I’m using is 2.3.

Lars_Simon_Zehnder · September 7, 2023, 8:11am

Still early in the morning

I would have thought that importing the environment would work already, but I guess you already do this.

Ciro_NA · September 7, 2023, 8:17am

Sure, I’m importing the environment too. The error message is: ‘tf.enable_eager_execution must be called at program setup’. The original config had eager_tracing = True.

Lars_Simon_Zehnder · September 7, 2023, 8:21am

This you could overcome by

from ray.rllib.utils.framework import try_import_tf

_, tf, _ = try_import_tf()
tf.enable_eager_execution()

Ciro_NA · September 7, 2023, 8:27am

You are right, it works. Thank you @Lars_Simon_Zehnder

Lars_Simon_Zehnder · September 7, 2023, 8:32am

Alright, this might be a default thing in the library. Probably, TF1 needs to have eager execution enabled:

from ray.rllib.utils.framework import try_import_tf

tf1, tf, _ = try_import_tf()
tf1.enable_eager_execution()

Topic		Replies	Views
Rllib checkpointing environment in Tune RLlib	1	413	June 2, 2022
Empty checkpoint files with Tune.run RLlib	1	374	March 30, 2022
Load prior `tune.run()` results from disk Ray Tune	3	1160	December 21, 2021
How to create checkpoints RLlib	2	327	July 11, 2022
TF error when restoring from checkpoint, multi-agent RLlib	7	1604	April 10, 2021

Load checkpoint from tune experiment when using custom environment not registered

Related topics