Hi, always thank everyone all for helping.
I’m using ray with the same settings, but this only happens on one computer. Does anyone know what the problem is in the process of executing the learned object? The error code is listed below.
Thank you.
INFO resource_spec.py:231 -- Starting Ray with 64.75 GiB memory available for workers and up to 31.74 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2021-12-29 19:41:05,898 INFO services.py:1193 -- View the Ray dashboard at localhost:8275
2021-12-29 19:41:06,964 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future!
2021-12-29 19:41:06,964 INFO trainer.py:632 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
2021-12-29 19:41:06,966 WARNING deprecation.py:30 -- DeprecationWarning: `callbacks dict interface` has been deprecated. Use `a class extending rllib.agents.callbacks.DefaultCallbacks` instead. This will raise an error in the future!
2021-12-29 19:43:55,571 WARNING worker.py:1134 -- The node with node id 08582d0e0f48bc01c3526c38c2d002b11c52b1b4 has been marked dead because the detector has missed too many heartbeats from it.
(pid=raylet) F1229 19:43:57.603041 15766 15766 node_manager.cc:661] Check failed: node_id != self_node_id_ Exiting because this node manager has mistakenly been marked dead by the monitor.
(pid=raylet) *** Check failure stack trace: ***
(pid=raylet) @ 0x563fd54fdd1d google::LogMessage::Fail()
(pid=raylet) @ 0x563fd54fee7c google::LogMessage::SendToLog()
(pid=raylet) @ 0x563fd54fd9f9 google::LogMessage::Flush()
(pid=raylet) @ 0x563fd54fdc11 google::LogMessage::~LogMessage()
(pid=raylet) @ 0x563fd54e90d9 ray::RayLog::~RayLog()
(pid=raylet) @ 0x563fd5201ef3 ray::raylet::NodeManager::NodeRemoved()
(pid=raylet) @ 0x563fd52021bc _ZNSt17_Function_handlerIFvRKN3ray8ClientIDERKNS0_3rpc11GcsNodeInfoEEZNS0_6raylet11NodeManager11RegisterGcsEvEUlS3_S7_E0_E9_M_invokeERKSt9_Any_dataS3_S7_
(pid=raylet) @ 0x563fd52e6550 ray::gcs::ServiceBasedNodeInfoAccessor::HandleNotification()
(pid=raylet) @ 0x563fd52e6826 _ZNSt17_Function_handlerIFvRKSsS1_EZZN3ray3gcs28ServiceBasedNodeInfoAccessor26AsyncSubscribeToNodeChangeERKSt8functionIFvRKNS3_8ClientIDERKNS3_3rpc11GcsNodeInfoEEERKS6_IFvNS3_6StatusEEEENKUlSM_E0_clESM_EUlS1_S1_E_E9_M_invokeERKSt9_Any_dataS1_S1_
(pid=raylet) @ 0x563fd52f0c4a _ZNSt17_Function_handlerIFvSt10shared_ptrIN3ray3gcs13CallbackReplyEEEZNS2_9GcsPubSub24ExecuteCommandIfPossibleERKSsRNS6_7ChannelEEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
(pid=raylet) @ 0x563fd52f262b _ZN5boost4asio6detail18completion_handlerIZN3ray3gcs20RedisCallbackManager12CallbackItem8DispatchERSt10shared_ptrINS4_13CallbackReplyEEEUlvE_E11do_completeEPvPNS1_19scheduler_operationERKNS_6system10error_codeEm
(pid=raylet) @ 0x563fd57e401f boost::asio::detail::scheduler::do_run_one()
(pid=raylet) @ 0x563fd57e5521 boost::asio::detail::scheduler::run()
(pid=raylet) @ 0x563fd57e6552 boost::asio::io_context::run()
(pid=raylet) @ 0x563fd515a69e main
(pid=raylet) @ 0x7f4c65be8bf7 __libc_start_main
(pid=raylet) @ 0x563fd516c6b1 (unknown)
2021-12-29 19:47:15,051 INFO trainable.py:251 -- Trainable.setup took 368.088 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2021-12-29 19:47:15,051 WARNING util.py:37 -- Install gputil for GPU system monitoring.
Traceback (most recent call last):
File "visualizer_rllib.py", line 470, in <module>
visualizer_rllib(args)
File "visualizer_rllib.py", line 159, in visualizer_rllib
agent.restore(checkpoint)
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/tune/trainable.py", line 467, in restore
self.load_checkpoint(checkpoint_path)
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 685, in load_checkpoint
self.__setstate__(extra_data)
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 125, in __setstate__
Trainer.__setstate__(self, state)
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 1185, in __setstate__
remote_state = ray.put(state["worker"])
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/worker.py", line 1570, in put
object_ref = worker.put_object(value, pin_object=not weakref)
File "/home/user/anaconda3/envs/flow/lib/python3.7/site-packages/ray/worker.py", line 274, in put_object
pin_object=pin_object))
File "python/ray/_raylet.pyx", line 791, in ray._raylet.CoreWorker.put_serialized_object
File "python/ray/_raylet.pyx", line 759, in ray._raylet.CoreWorker._create_put_buffer
File "python/ray/_raylet.pyx", line 151, in ray._raylet.check_status
ray.exceptions.RayletError: The Raylet died with this message: Broken pipe
E1229 19:47:15.560415 15706 15706 raylet_client.cc:124] IOError: Broken pipe [RayletClient] Failed to disconnect from raylet.