However, it seems ray cannot really exclude .git and reported the following error:
2022-06-15 15:24:00,301 INFO packaging.py:363 -- Creating a file package for local directory '/home/me/app'.
2022-06-15 15:24:00,656 WARNING packaging.py:259 -- File /home/me/app/.git/objects/pack/pack-363f95fdf8dc7f3144d8a4daa0695d4dd75ef07e.pack is very large (42.68MiB). Consider adding this file to the 'excludes' list to skip uploading it: `ray.init(..., runtime_env={'excludes': ['/home/me/app/.git/objects/pack/pack-363f95fdf8dc7f3144d8a4daa0695d4dd75ef07e.pack']})`
2022-06-15 15:24:01,745 WARNING packaging.py:259 -- File /home/me/app/.git/modules/third_party/ray/objects/pack/pack-de70ab7af10a6927b56eed9da619bcaad23c7814.pack is very large (158.96MiB). Consider adding this file to the 'excludes' list to skip uploading it: `ray.init(..., runtime_env={'excludes': ['/home/me/app/.git/modules/third_party/ray/objects/pack/pack-de70ab7af10a6927b56eed9da619bcaad23c7814.pack']})`
2022-06-15 15:24:02,050 WARNING packaging.py:259 -- File /home/me/app/.git/modules/third_party/meltingpot/objects/pack/pack-1c8ed26605bd47ade6c6d14b4311af921bbb6255.pack is very large (190.81MiB). Consider adding this file to the 'excludes' list to skip uploading it: `ray.init(..., runtime_env={'excludes': ['/home/me/app/.git/modules/third_party/meltingpot/objects/pack/pack-1c8ed26605bd47ade6c6d14b4311af921bbb6255.pack']})`
[ERROR 15:24:11] pymarl Failed after 0:00:21!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/me/app/epymarl/src/main.py", line 65, in my_main
run_train_meltingpot(_run, config, _log)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 60, in run
run_sequential(args=args, logger=logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 104, in run_sequential
ray.init("auto", runtime_env=runtime_env)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 977, in init
connect(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1517, in connect
runtime_env = upload_working_dir_if_needed(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/runtime_env/working_dir.py", line 64, in upload_working_dir_if_needed
upload_package_if_needed(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/runtime_env/packaging.py", line 411, in upload_package_if_needed
upload_package_to_gcs(pkg_uri, package_file.read_bytes())
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/runtime_env/packaging.py", line 343, in upload_package_to_gcs
_store_package_in_gcs(pkg_uri, pkg_bytes)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/runtime_env/packaging.py", line 218, in _store_package_in_gcs
raise RuntimeError(
RuntimeError: Package size (532.13MiB) exceeds the maximum size of 100.00MiB. You can exclude large files using the 'excludes' option to the runtime_env.
You can find that excludes seems cannot work.
I tried to add that big file to excludes, and it still reported the same error. I seems excludes cannot work.
Also, if you think we should modify the default behavior, please feel free to leave a comment here or in the issue. It seems like we can’t support both absolute paths and also support gitignore syntax, because they have conflicting meanings for paths that start with /. So we need to pick a reasonable default, or find a compromise somehow…
Hi, @architkulkarni thanks. I think the runtime env is not what I want. I use docker and created a cluster. The environment has been created as I set the docker image. The runtime env seems to be an environment setup setting. I need to set many things to complete the runtime env setup, which is contradicting what I have done with docker. Following your suggestion, it returns the following error:
(raylet, ip=172.24.56.163) [2022-06-17 18:32:17,992 E 73 73] (raylet) agent_manager.cc:136: Not all required Ray dependencies for the runtime_env feature were found. To install the required dependencies, please
run `pip install "ray[default]"`.
[ERROR 18:32:18] pymarl Failed after 0:00:02!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/me/app/epymarl/src/main.py", line 65, in my_main
run_train_meltingpot(_run, config, _log)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 62, in run
run_sequential(args=args, logger=logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 122, in run_sequential
buffer, queue, buffer_queue, ray_ws = create_buffer(args, scheme, groups, env_info, preprocess, logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 423, in create_buffer
assert ray.get(buffer.ready.remote())
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1765, in get
raise value
ray.exceptions.RuntimeEnvSetupError: The runtime_env failed to be set up.
I think it was asking me to set the requirements. I think I do not need to set it as I am using docker.
I want to fix why a created k8s ray pod cluster cannot be reused?
The following shows how I use the k8s ray pod cluster:
1. The admin created a new chart and created a ray operator
2. I use the YAML file to create a ray pod cluster
3. I log in to the head node and run the code (it works fine for debugging purpose)
4. I kill the current programme and then re-run the code. However, the cluster cannot be reused
5. I have to create a new cluster and run my job, which costs more time and patience.
If you are already using docker, it may be faster to bake in all your dependencies in the Docker image. If you want to set up the dependencies dynamically at runtime, you can use runtime_env. If you use them together, my guess is that due to the order of operations, the runtime_env specifications will override the ones in the Docker container.
I saw this: (raylet) agent_manager.cc:136: Not all required Ray dependencies for the runtime_env feature were found. To install the required dependencies, please
run pip install "ray[default]"
Is ray[default] installed on all nodes of the cluster?
If that still doesn’t work, to understand the RuntimeEnvSetupError, do you mind pasting the dashboard_agent.log file and sharing what Ray version you’re using? By default these logs are located at /tmp/ray/session_latest/logs on the head node of the cluster.
Hi, @architkulkarni, here is the output of dashboard_agent.log. BTW, why cannot I reuse the cluster? Do you have any best practices? Trial-and-error really take time. I think there are some smart ways to solve this problem.
2022-06-18 23:43:06,723 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:43:06,724 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:43:06,725 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:43:06,998 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:43:06,998 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:43:06,998 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:43:07,000 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
2022-06-18 23:43:07,001 ERROR agent.py:436 -- [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/home/me/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/home/me/miniconda3/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2022-06-18 23:43:09,752 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:43:09,753 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:43:09,754 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:43:10,009 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:43:10,009 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:43:10,009 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:43:10,011 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
2022-06-18 23:43:10,012 ERROR agent.py:436 -- [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/home/me/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/home/me/miniconda3/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2022-06-18 23:43:14,746 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:43:14,747 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:43:14,748 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:43:15,001 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:43:15,002 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:43:15,002 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:43:15,003 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
2022-06-18 23:43:15,004 ERROR agent.py:436 -- [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/home/me/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/home/me/miniconda3/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2022-06-18 23:43:23,747 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:43:23,748 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:43:23,749 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:43:23,982 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:43:23,983 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:43:23,983 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:43:23,985 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
2022-06-18 23:43:23,985 ERROR agent.py:436 -- [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/home/me/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/home/me/miniconda3/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2022-06-18 23:43:40,661 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:43:40,661 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:43:40,663 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:43:40,915 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:43:40,915 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:43:40,915 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:43:40,917 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
2022-06-18 23:43:40,918 ERROR agent.py:436 -- [Errno -2] Name or service not known
Traceback (most recent call last):
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 391, in <module>
loop.run_until_complete(agent.run())
File "/home/me/miniconda3/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 178, in run
modules = self._load_modules()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/agent.py", line 120, in _load_modules
c = cls(self)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 161, in __init__
self._metrics_agent = MetricsAgent(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/metrics_agent.py", line 75, in __init__
prometheus_exporter.new_stats_exporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 332, in new_stats_exporter
exporter = PrometheusStatsExporter(
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 265, in __init__
self.serve_http()
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/prometheus_exporter.py", line 319, in serve_http
start_http_server(
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 168, in start_wsgi_server
TmpServer.address_family, addr = _get_best_family(addr, port)
File "/home/me/miniconda3/lib/python3.9/site-packages/prometheus_client/exposition.py", line 157, in _get_best_family
infos = socket.getaddrinfo(address, port)
File "/home/me/miniconda3/lib/python3.9/socket.py", line 954, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
2022-06-18 23:44:13,648 INFO agent.py:100 -- Parent pid is 115
2022-06-18 23:44:13,648 INFO agent.py:105 -- Dashboard agent grpc address: 0.0.0.0:44581
2022-06-18 23:44:13,649 INFO utils.py:79 -- Get all modules by type: DashboardAgentModule
2022-06-18 23:44:13,902 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.event.event_agent.EventAgent'>
2022-06-18 23:44:13,902 INFO event_agent.py:31 -- Event agent cache buffer size: 10240
2022-06-18 23:44:13,902 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.log.log_agent.LogAgent'>
2022-06-18 23:44:13,904 INFO agent.py:118 -- Loading DashboardAgentModule: <class 'ray.dashboard.modules.reporter.reporter_agent.ReporterAgent'>
Hi @GoingMyWay thanks for pasting the log. Sorry for the frustration with the trial-and-error, hopefully we can get it working soon. You should be able to reuse the cluster once we figure out this problem, but my guess is for this particular kind of failure the cluster unfortunately needs to be restarted.
I haven’t seen socket.gaierror: [Errno -2] Name or service not known before and I’m not sure how to debug it – it looks like it might be some kind of failure of cluster nodes to communicate with each other over the network. @sangcho or @GuyangSong have you seen this before or do you have any ideas on how to debug it?
(raylet, ip=172.24.56.163) [2022-06-17 18:32:17,992 E 73 73] (raylet) agent_manager.cc:136: Not all required Ray dependencies for the runtime_env feature were found. To install the required dependencies, please
run `pip install "ray[default]"`.
Does this error message still appear in your case?
If it appears, can you paste the command line of raylet by “ps -ef | grep raylet”?
@GuyangSong, For the first run, I did not set it. Then, I set it and ran the code.
pid=gcs_server) [2022-06-23 21:02:30,624 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 512e4dd976cf969e81ae8b479ad888a40cae2f8a7c89aa76a023f104 for actor 4b60b9fcc[0/269]
bcd40a5601000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,633 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node b88ae7a900d5774a398def1c9792c5e6692c2446e85183e00c257b9f for actor 7bc55c1eecaa08f9
fa80dbd901000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,642 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 4b68912ec6784ea01f26a9548bf68221a58faee9afb6ad0dea2dacaa for actor 5497df4a81fac901
e1be7ec401000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,652 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 4b68912ec6784ea01f26a9548bf68221a58faee9afb6ad0dea2dacaa for actor e6aa9f3bfb8cea4d
b7d08b8401000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,668 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node b88ae7a900d5774a398def1c9792c5e6692c2446e85183e00c257b9f for actor 6e15381ff4d31a63
3c77974d01000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,684 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 4b68912ec6784ea01f26a9548bf68221a58faee9afb6ad0dea2dacaa for actor bb6379f2a6cb30db
f408263901000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
[INFO 21:02:30] run_meltingpot Buffer size: 600
[ERROR 21:02:30] pymarl Failed after 0:00:03!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/me/app/epymarl/src/main.py", line 66, in my_main
run_train_meltingpot(_run, config, _log)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 62, in run
run_sequential(args=args, logger=logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 122, in run_sequential
buffer, queue, buffer_queue, ray_ws = create_buffer(args, scheme, groups, env_info, preprocess, logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 509, in create_buffer
assert ray.get(buffer.ready.remote())
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1765, in get
raise value
ray.exceptions.RuntimeEnvSetupError: The runtime_env failed to be set up.
(raylet) [2022-06-23 21:02:30,763 C 109 109] (raylet) dependency_manager.cc:208: Check failed: task_entry != queued_task_requests_.end() Can't remove dependencies of tasks that are not queued.
(raylet) *** StackTrace Information ***
(raylet) ray::SpdLogMessage::Flush()
(raylet) ray::RayLog::~RayLog()
(raylet) ray::raylet::DependencyManager::RemoveTaskDependencies()
(raylet) ray::raylet::ClusterTaskManager::PoppedWorkerHandler()
(raylet) std::_Function_handler<>::_M_invoke()
(raylet) std::_Function_handler<>::_M_invoke()
(raylet) std::_Function_handler<>::_M_invoke()
(raylet) std::_Function_handler<>::_M_invoke()
(raylet) boost::asio::detail::wait_handler<>::do_complete()
(raylet) boost::asio::detail::scheduler::do_run_one()
(raylet) boost::asio::detail::scheduler::run()
(raylet) boost::asio::io_context::run()
(raylet) main
(raylet) __libc_start_main
(raylet)
(pid=gcs_server) [2022-06-23 21:02:30,709 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 4b68912ec6784ea01f26a9548bf68221a58faee9afb6ad0dea2dacaa for actor e23a3d4127997687
6bdce53201000000(_QueueActor.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
(pid=gcs_server) [2022-06-23 21:02:30,734 E 60 60] (gcs_server) gcs_actor_scheduler.cc:320: The lease worker request from node 167498de1e2bf62a2035943e1f85515f74c77677c92fdddc217ae725 for actor 57c6f66434f69b96
3200f29d01000000(ReplayBufferwithQueue.__init__) has been canceled, job id = 01000000, cancel type: SCHEDULING_CANCELLED_RUNTIME_ENV_SETUP_FAILED
Then. I also launched a new cluster and ran the code. I got the same error.