Hi, @architkulkarni thanks. I think the runtime env is not what I want. I use docker and created a cluster. The environment has been created as I set the docker image. The runtime env seems to be an environment setup setting. I need to set many things to complete the runtime env setup, which is contradicting what I have done with docker. Following your suggestion, it returns the following error:
(raylet, ip=172.24.56.163) [2022-06-17 18:32:17,992 E 73 73] (raylet) agent_manager.cc:136: Not all required Ray dependencies for the runtime_env feature were found. To install the required dependencies, please
run `pip install "ray[default]"`.
[ERROR 18:32:18] pymarl Failed after 0:00:02!
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/me/app/epymarl/src/main.py", line 65, in my_main
run_train_meltingpot(_run, config, _log)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 62, in run
run_sequential(args=args, logger=logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 122, in run_sequential
buffer, queue, buffer_queue, ray_ws = create_buffer(args, scheme, groups, env_info, preprocess, logger)
File "/home/me/app/epymarl/src/run_meltingpot.py", line 423, in create_buffer
assert ray.get(buffer.ready.remote())
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/me/miniconda3/lib/python3.9/site-packages/ray/worker.py", line 1765, in get
raise value
ray.exceptions.RuntimeEnvSetupError: The runtime_env failed to be set up.
I think it was asking me to set the requirements. I think I do not need to set it as I am using docker.
I want to fix why a created k8s ray pod cluster cannot be reused?
The following shows how I use the k8s ray pod cluster:
1. The admin created a new chart and created a ray operator
2. I use the YAML file to create a ray pod cluster
3. I log in to the head node and run the code (it works fine for debugging purpose)
4. I kill the current programme and then re-run the code. However, the cluster cannot be reused
5. I have to create a new cluster and run my job, which costs more time and patience.