- High: It blocks me to complete my task.
I am trying to run code from GitHub - mila-iqia/climate-cooperation-competition: AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N. ai4climatecoop.org which relies on Ray 1.0.0 and I want to migrate the functions to Ray > 2.0.0.
I am using google colab with the installations,
!pip install -r requirements.txt
!pip install rl_warp_drive==1.7.0
!pip install ray[rllib]==2.0.0
!pip install codecarbon
importlib-metadata==4.12.0
flask==2.1.1
gym==0.21.0
pandas==1.3.0
waitress==2.1.1
jupyterlab>=3.4.0
tensorflow==1.13.1
torch==1.9.0
matplotlib==3.2.2
numpy==1.21.6
deepdiff==5.8.1
pyyaml==6.0
The following code runs,
from ray import tune
tuner = tune.Tuner(
"PPO",
tune_config=tune.TuneConfig(
metric="episode_reward_mean",
mode="max",
#scheduler=pbt,
num_samples=1),
param_space={
'env': EnvWrapper,
'num_workers': 1,
'num_gpus':1
}
)
where EnvWrapper is defined in climate-cooperation-competition/train_with_rllib.py at 79cdcfa08976c58aa20a6cc0722bc30420615be9 · mila-iqia/climate-cooperation-competition · GitHub
I run
tuner.fit()
and have error:
== Status ==
Current time: 2023-02-10 22:05:10 (running for 00:00:05.09)
Memory usage on this node: 2.3/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/1 GPUs, 0.0/7.38 GiB heap, 0.0/3.69 GiB objects
Result logdir: /root/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
Trial name status loc
PPO_EnvWrapper_f9a41_00000 ERROR
Number of errored trials: 1
Trial name # failures error file
PPO_EnvWrapper_f9a41_00000 1 /root/ray_results/PPO/PPO_EnvWrapper_f9a41_00000_0_2023-02-10_22-05-05/error.txt
/usr/local/lib/python3.8/dist-packages/ray/util/placement_group.py:78: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return bundle_reservation_check.options(
/usr/local/lib/python3.8/dist-packages/ray/_private/ray_option_utils.py:266: DeprecationWarning: Setting ‘object_store_memory’ for actors is deprecated since it doesn’t actually reserve the required object store memory. Use object spilling that’s enabled by default (Object Spilling — Ray 2.0.0) instead to bypass the object store memory size limitation.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/ray/actor.py:637: DeprecationWarning: placement_group parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/usr/local/lib/python3.8/dist-packages/ray/actor.py:637: DeprecationWarning: placement_group_bundle_index parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
/usr/local/lib/python3.8/dist-packages/ray/actor.py:637: DeprecationWarning: placement_group_capture_child_tasks parameter is deprecated. Use scheduling_strategy=PlacementGroupSchedulingStrategy(…) instead, see the usage at Ray Core API — Ray 2.0.0.
return actor_cls._remote(args=args, kwargs=kwargs, **updated_options)
2023-02-10 22:05:06,909 WARNING worker.py:1829 – Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/worker.py”, line 1957, in connect
node.check_version_info()
File “/usr/local/lib/python3.8/dist-packages/ray/_private/node.py”, line 359, in check_version_info
ray._private.utils.check_version_info(cluster_metadata)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/utils.py”, line 1533, in check_version_info
raise RuntimeError(error_message)
RuntimeError: Version mismatch: The cluster was started with:
Ray: 2.0.0
Python: 3.8.10
This process on node 172.28.0.12 was started with:
Ray: 2.2.0
Python: 3.8.10
2023-02-10 22:05:10,780 WARNING worker.py:1829 – Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 789, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 814, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 506, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: No module named ‘run_unittests’
traceback: Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named ‘run_unittests’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 1135, in ray._raylet.task_execution_handler
File “python/ray/_raylet.pyx”, line 1045, in ray._raylet.execute_task_with_cancellation_handler
File “python/ray/_raylet.pyx”, line 782, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 945, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 575, in ray._raylet.store_task_errors
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 586, in temporary_actor_method
raise RuntimeError(
RuntimeError: The actor with name PPO failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 625, in _load_actor_class_from_gcs
actor_class = pickle.loads(pickled_class)
AttributeError: Can’t get attribute ‘Trainable.get_current_ip’ on <module ‘ray.tune.trainable.trainable’ from ‘/usr/local/lib/python3.8/dist-packages/ray/tune/trainable/trainable.py’>
An unexpected internal error occurred while the worker was executing a task.
2023-02-10 22:05:10,792 WARNING worker.py:1829 – A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff4f1d62771af3c3dead23481901000000 Worker ID: b2e6e2c0649c100dbb188c717413786aaab9f9c436dfc891b14a64a2 Node ID: ee763a11e5c0a5a3a3de9d7855e267c7266a1f823521e1e121334ea9 Worker IP address: 172.28.0.12 Worker port: 35431 Worker PID: 8160 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker exits unexpectedly. Worker exits with an exit code None.
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 789, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 814, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 506, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: No module named ‘run_unittests’
traceback: Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named ‘run_unittests’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 1135, in ray._raylet.task_execution_handler
File “python/ray/_raylet.pyx”, line 1045, in ray._raylet.execute_task_with_cancellation_handler
File “python/ray/_raylet.pyx”, line 782, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 945, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 575, in ray._raylet.store_task_errors
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 586, in temporary_actor_method
raise RuntimeError(
RuntimeError: The actor with name PPO failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 625, in _load_actor_class_from_gcs
actor_class = pickle.loads(pickled_class)
AttributeError: Can’t get attribute ‘Trainable.get_current_ip’ on <module ‘ray.tune.trainable.trainable’ from ‘/usr/local/lib/python3.8/dist-packages/ray/tune/trainable/trainable.py’>
An unexpected internal error occurred while the worker was executing a task.
2023-02-10 22:05:10,794 ERROR trial_runner.py:980 – Trial PPO_EnvWrapper_f9a41_00000: Error processing event.
ray.tune.error._TuneNoNextExecutorEventError: Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/tune/execution/ray_trial_executor.py”, line 989, in get_next_executor_event
# First update status of staged placement groups
File “/usr/local/lib/python3.8/dist-packages/ray/_private/client_mode_hook.py”, line 105, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/worker.py”, line 2277, in get
“”"
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
class_name: PPO
actor_id: 4f1d62771af3c3dead23481901000000
pid: 8160
namespace: 4bc69c3a-113c-4cea-8a6c-04f434817f44
ip: 172.28.0.12
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker exits unexpectedly. Worker exits with an exit code None.
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 789, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 814, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 506, in ray._raylet.raise_if_dependency_failed
ray.exceptions.RaySystemError: System error: No module named ‘run_unittests’
traceback: Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
obj = self._deserialize_object(data, metadata, object_ref)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
return self._deserialize_msgpack_data(data, metadata_fields)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
python_objects = self._deserialize_pickle5_data(pickle5_data)
File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
obj = pickle.loads(in_band)
ModuleNotFoundError: No module named ‘run_unittests’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “python/ray/_raylet.pyx”, line 1135, in ray._raylet.task_execution_handler
File “python/ray/_raylet.pyx”, line 1045, in ray._raylet.execute_task_with_cancellation_handler
File “python/ray/_raylet.pyx”, line 782, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 945, in ray._raylet.execute_task
File “python/ray/_raylet.pyx”, line 575, in ray._raylet.store_task_errors
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 586, in temporary_actor_method
raise RuntimeError(
RuntimeError: The actor with name PPO failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 625, in _load_actor_class_from_gcs
actor_class = pickle.loads(pickled_class)
AttributeError: Can’t get attribute ‘Trainable.get_current_ip’ on <module ‘ray.tune.trainable.trainable’ from ‘/usr/local/lib/python3.8/dist-packages/ray/tune/trainable/trainable.py’>
An unexpected internal error occurred while the worker was executing a task.
Result for PPO_EnvWrapper_f9a41_00000:
trial_id: f9a41_00000
(TemporaryActor pid=8160) 2023-02-10 22:05:10,778 ERROR serialization.py:371 – No module named ‘run_unittests’
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
(TemporaryActor pid=8160) obj = self._deserialize_object(data, metadata, object_ref)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
(TemporaryActor pid=8160) return self._deserialize_msgpack_data(data, metadata_fields)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
(TemporaryActor pid=8160) python_objects = self._deserialize_pickle5_data(pickle5_data)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
(TemporaryActor pid=8160) obj = pickle.loads(in_band)
(TemporaryActor pid=8160) ModuleNotFoundError: No module named ‘run_unittests’
(TemporaryActor pid=8160) 2023-02-10 22:05:10,779 ERROR worker.py:763 – Worker exits with an exit code None.
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 789, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 814, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 506, in ray._raylet.raise_if_dependency_failed
(TemporaryActor pid=8160) ray.exceptions.RaySystemError: System error: No module named ‘run_unittests’
(TemporaryActor pid=8160) traceback: Traceback (most recent call last):
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
(TemporaryActor pid=8160) obj = self._deserialize_object(data, metadata, object_ref)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
(TemporaryActor pid=8160) return self._deserialize_msgpack_data(data, metadata_fields)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
(TemporaryActor pid=8160) python_objects = self._deserialize_pickle5_data(pickle5_data)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
(TemporaryActor pid=8160) obj = pickle.loads(in_band)
(TemporaryActor pid=8160) ModuleNotFoundError: No module named ‘run_unittests’
(TemporaryActor pid=8160)
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) During handling of the above exception, another exception occurred:
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 1135, in ray._raylet.task_execution_handler
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 1045, in ray._raylet.execute_task_with_cancellation_handler
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 782, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 945, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 575, in ray._raylet.store_task_errors
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 586, in temporary_actor_method
(TemporaryActor pid=8160) raise RuntimeError(
(TemporaryActor pid=8160) RuntimeError: The actor with name PPO failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 625, in _load_actor_class_from_gcs
(TemporaryActor pid=8160) actor_class = pickle.loads(pickled_class)
(TemporaryActor pid=8160) AttributeError: Can’t get attribute ‘Trainable.get_current_ip’ on <module ‘ray.tune.trainable.trainable’ from ‘/usr/local/lib/python3.8/dist-packages/ray/tune/trainable/trainable.py’>
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) An unexpected internal error occurred while the worker was executing a task.
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 789, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 814, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 506, in ray._raylet.raise_if_dependency_failed
(TemporaryActor pid=8160) ray.exceptions.RaySystemError: System error: No module named ‘run_unittests’
(TemporaryActor pid=8160) traceback: Traceback (most recent call last):
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 369, in deserialize_objects
(TemporaryActor pid=8160) obj = self._deserialize_object(data, metadata, object_ref)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 252, in _deserialize_object
(TemporaryActor pid=8160) return self._deserialize_msgpack_data(data, metadata_fields)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 207, in _deserialize_msgpack_data
(TemporaryActor pid=8160) python_objects = self._deserialize_pickle5_data(pickle5_data)
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/serialization.py”, line 197, in _deserialize_pickle5_data
(TemporaryActor pid=8160) obj = pickle.loads(in_band)
(TemporaryActor pid=8160) ModuleNotFoundError: No module named ‘run_unittests’
(TemporaryActor pid=8160)
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) During handling of the above exception, another exception occurred:
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 1135, in ray._raylet.task_execution_handler
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 1045, in ray._raylet.execute_task_with_cancellation_handler
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 782, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 945, in ray._raylet.execute_task
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 575, in ray._raylet.store_task_errors
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 586, in temporary_actor_method
(TemporaryActor pid=8160) raise RuntimeError(
(TemporaryActor pid=8160) RuntimeError: The actor with name PPO failed to import on the worker. This may be because needed library dependencies are not installed in the worker environment:
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “/usr/local/lib/python3.8/dist-packages/ray/_private/function_manager.py”, line 625, in _load_actor_class_from_gcs
(TemporaryActor pid=8160) actor_class = pickle.loads(pickled_class)
(TemporaryActor pid=8160) AttributeError: Can’t get attribute ‘Trainable.get_current_ip’ on <module ‘ray.tune.trainable.trainable’ from ‘/usr/local/lib/python3.8/dist-packages/ray/tune/trainable/trainable.py’>
(TemporaryActor pid=8160)
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) During handling of the above exception, another exception occurred:
(TemporaryActor pid=8160)
(TemporaryActor pid=8160) Traceback (most recent call last):
(TemporaryActor pid=8160) File “python/ray/_raylet.pyx”, line 1176, in ray._raylet.task_execution_handler
(TemporaryActor pid=8160) SystemExit
2023-02-10 22:05:10,925 ERROR tune.py:754 – Trials did not complete: [PPO_EnvWrapper_f9a41_00000]
2023-02-10 22:05:10,926 INFO tune.py:758 – Total run time: 5.22 seconds (5.08 seconds for the tuning loop).
<ray.tune.result_grid.ResultGrid at 0x7f3079fa2490>