Error creating RLPredictor using restored checkpoint

steff · December 9, 2022, 12:40am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi experts.

Get the following error when restoring checkpoint for a DQN model using Ray 2.1.0:

Traceback (most recent call last):
File “/home/stefan/PycharmProjects/RLProjects/rl_offline_trainer/inference_app.py”, line 61, in
predictor = RLPredictor.from_checkpoint(checkpoint)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/train/rl/rl_predictor.py”, line 63, in from_checkpoint
policy = checkpoint.get_policy(env)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/train/rl/rl_checkpoint.py”, line 42, in get_policy
return Policy.from_checkpoint(checkpoint=self)[“default_policy”]
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/rllib/policy/policy.py”, line 256, in from_checkpoint
policy_state = pickle.load(f)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/serialization.py”, line 89, in _actor_handle_deserializer
return ray.actor.ActorHandle._deserialization_helper(serialized_obj, outer_id)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/actor.py”, line 1281, in _deserialization_helper
return worker.core_worker.deserialize_and_register_actor_handle(
File “python/ray/_raylet.pyx”, line 2137, in ray._raylet.CoreWorker.deserialize_and_register_actor_handle
File “python/ray/_raylet.pyx”, line 2106, in ray._raylet.CoreWorker.make_actor_handle
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 522, in load_actor_class
actor_class = self._load_actor_class_from_gcs(
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 617, in _load_actor_class_from_gcs
class_name = ensure_str(class_name)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/utils.py”, line 289, in ensure_str
assert isinstance(s, bytes)
AssertionError

Use Ray Tune to execute several trials with option to save checkpoint at end using the following:

    # create tuner
    tuner = Tuner(

        # trainer
        trainer,

        # create tune configuration
        tune_config=self.create_tune_config(
            search_algo=search_algo,
            scheduler=scheduler
        ),

        # hyper-parameters
        param_space=self.create_param_space(),

        # specify run configuration
        run_config=RunConfig(
            stop=dict(training_iteration=2),
            checkpoint_config=CheckpointConfig(checkpoint_at_end=True),
            verbose=3
        )
    )

    # run trials
    result_grid = tuner.fit()

Recreate best checkpoint and use it to create RLPredictor at which point the above error occurs:

    # recreate checkpoint
    checkpoint = Checkpoint.from_directory(path=checkpoint_path)

    # create RLPredictor from checkpoint - errors occurs when this executes
    predictor = RLPredictor.from_checkpoint(checkpoint)

From what I can tell the checkpoint folder contains all necessary artifacts. What am I doing wrong?

Thanks.
Stefan

arturn · December 15, 2022, 9:04am

Hi @steff,

The code looks good. Could you turn this into a GH issue with a complete repro script?

Cheers

steff · December 15, 2022, 9:57pm

Sure. Is there a web page that describes the steps?

arturn · December 16, 2022, 8:53am

Hi @steff ,

Nothing special, I can write the steps down:

Go to official ray repo
Click issues, create issue
Fill out form, include repro script and probably description of what you expected to happen vs was is happing
Post link here for reference

jwlarocque · March 28, 2023, 6:20pm

Hey @steff, I have the same problem in 2.3.1, did you end up creating an issue for this or finding a resolution?

steff · April 2, 2023, 3:58pm

Created an issue for this. Here is the link: Cannot create RLPredictor using restored checkpoint in different Ray session · Issue #33995 · ray-project/ray · GitHub

Topic		Replies	Views
Restoring a policy or a keras model from a checkpoint RLlib	1	547	March 1, 2023
Error when loading and restoring a trained algorithm from a checkpoint using a APPO Algorithm RLlib	1	326	February 14, 2023
How to create checkpoints RLlib	2	329	July 11, 2022
Restoring a RLModule checkpoint with pytorch RLlib	1	22	February 22, 2025
Restoring APEX_DDPG trainer using checkpoint saved with older ray version RLlib	0	425	May 28, 2021

Error creating RLPredictor using restored checkpoint

Related topics