Hi,
How can I use tune in combination with an environment that contains pre-trained Tensorflow models?
Details:
Tensorflow==2.7.0
Ray==1.10.0
I wrapped my environment to do some observation pre-processing using pre-trained Tensorflow (Keras) models. However, this raises a NotFoundError when tune attempts to serialize the pre-trained models (see stack trace below). I tried to exclude the models from pickling by overriding getstate and setstate, but with no success. Can I somehow exclude these models from serialization or make the serialization work? (using trainer.save() from rllib does work)
I would appreciate any help/suggestions!
Luc
2022-02-16 13:18:08,974 ERROR ray_trial_executor.py:559 -- Trial PPOTrainer_TreasureHunt-v2_7fa15_00000: Unexpected error starting runner.
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\tune\ray_trial_executor.py", line 549, in start_trial
return self._start_trial(trial)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\tune\ray_trial_executor.py", line 447, in _start_trial
runner = self._setup_remote_runner(trial)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\tune\ray_trial_executor.py", line 391, in _setup_remote_runner
return full_actor_class.remote(**kwargs)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\actor.py", line 553, in remote
scheduling_strategy=scheduling_strategy)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\util\tracing\tracing_helper.py", line 371, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\actor.py", line 871, in _remote
scheduling_strategy=scheduling_strategy)
File "python\ray\_raylet.pyx", line 1546, in ray._raylet.CoreWorker.create_actor
File "python\ray\_raylet.pyx", line 1551, in ray._raylet.CoreWorker.create_actor
File "python\ray\_raylet.pyx", line 391, in ray._raylet.prepare_args
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\serialization.py", line 367, in serialize
return self._serialize_to_msgpack(value)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\serialization.py", line 347, in _serialize_to_msgpack
self._serialize_to_pickle5(metadata, python_objects)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\serialization.py", line 307, in _serialize_to_pickle5
raise e
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\serialization.py", line 304, in _serialize_to_pickle5
value, protocol=5, buffer_callback=writer.buffer_callback)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\cloudpickle\cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\ray\cloudpickle\cloudpickle_fast.py", line 620, in dump
return Pickler.dump(self, obj)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\keras\engine\training.py", line 315, in __reduce__
pickle_utils.serialize_model_as_bytecode(self))
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\keras\saving\pickle_utils.py", line 77, in serialize_model_as_bytecode
info.size = f.size()
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 99, in size
return stat(self.__name).length
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 910, in stat
return stat_v2(filename)
File "C:\ProgramData\Anaconda3\envs\FP_4_4_3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 926, in stat_v2
return _pywrap_file_io.Stat(compat.path_to_str(path))
tensorflow.python.framework.errors_impl.NotFoundError