How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi experts.
Get the following error when restoring checkpoint for a DQN model using Ray 2.1.0:
Traceback (most recent call last):
File “/home/stefan/PycharmProjects/RLProjects/rl_offline_trainer/inference_app.py”, line 61, in
predictor = RLPredictor.from_checkpoint(checkpoint)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/train/rl/rl_predictor.py”, line 63, in from_checkpoint
policy = checkpoint.get_policy(env)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/train/rl/rl_checkpoint.py”, line 42, in get_policy
return Policy.from_checkpoint(checkpoint=self)[“default_policy”]
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/rllib/policy/policy.py”, line 256, in from_checkpoint
policy_state = pickle.load(f)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/serialization.py”, line 89, in _actor_handle_deserializer
return ray.actor.ActorHandle._deserialization_helper(serialized_obj, outer_id)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/actor.py”, line 1281, in _deserialization_helper
return worker.core_worker.deserialize_and_register_actor_handle(
File “python/ray/_raylet.pyx”, line 2137, in ray._raylet.CoreWorker.deserialize_and_register_actor_handle
File “python/ray/_raylet.pyx”, line 2106, in ray._raylet.CoreWorker.make_actor_handle
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 522, in load_actor_class
actor_class = self._load_actor_class_from_gcs(
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/function_manager.py”, line 617, in _load_actor_class_from_gcs
class_name = ensure_str(class_name)
File “/home/stefan/anaconda3/envs/py38_ray2.1/lib/python3.8/site-packages/ray/_private/utils.py”, line 289, in ensure_str
assert isinstance(s, bytes)
AssertionError
Use Ray Tune to execute several trials with option to save checkpoint at end using the following:
# create tuner
tuner = Tuner(
# trainer
trainer,
# create tune configuration
tune_config=self.create_tune_config(
search_algo=search_algo,
scheduler=scheduler
),
# hyper-parameters
param_space=self.create_param_space(),
# specify run configuration
run_config=RunConfig(
stop=dict(training_iteration=2),
checkpoint_config=CheckpointConfig(checkpoint_at_end=True),
verbose=3
)
)
# run trials
result_grid = tuner.fit()
Recreate best checkpoint and use it to create RLPredictor at which point the above error occurs:
# recreate checkpoint
checkpoint = Checkpoint.from_directory(path=checkpoint_path)
# create RLPredictor from checkpoint - errors occurs when this executes
predictor = RLPredictor.from_checkpoint(checkpoint)
From what I can tell the checkpoint folder contains all necessary artifacts. What am I doing wrong?
Thanks.
Stefan