I started ray but the dashboard can’t be start successfully.
And my dashboard.log (mentioned in the picture above) is:
2022-06-08 18:55:32,623 INFO head.py:122 -- Dashboard head grpc address: 0.0.0.0:41596
2022-06-08 18:55:32,707 INFO utils.py:99 -- Get all modules by type: DashboardHeadModule
2022-06-08 18:55:43,698 WARNING tune_head.py:23 -- tune module is not available: No module named 'pandas'
2022-06-08 18:55:43,734 INFO utils.py:132 -- Available modules: [<class 'ray.dashboard.modules.actor.actor_head.ActorHead'>, <class 'ray.dashboard.modules.event.event_head.EventHead'>, <class 'ray.dashboard.modules.job.job_head.JobHead'>, <class 'ray.dashboard.modules.log.log_head.LogHead'>, <class 'ray.dashboard.modules.node.node_head.NodeHead'>, <class 'ray.dashboard.modules.reporter.reporter_head.ReportHead'>, <class 'ray.dashboard.modules.serve.serve_head.ServeHead'>, <class 'ray.dashboard.modules.snapshot.snapshot_head.APIHead'>, <class 'ray.dashboard.modules.tune.tune_head.TuneController'>, <class 'ray.dashboard.modules.usage_stats.usage_stats_head.UsageStatsHead'>]
2022-06-08 18:55:43,734 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.actor.actor_head.ActorHead'>
2022-06-08 18:55:43,759 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.event.event_head.EventHead'>
2022-06-08 18:55:43,759 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.job.job_head.JobHead'>
2022-06-08 18:55:43,759 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.log.log_head.LogHead'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.node.node_head.NodeHead'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.reporter.reporter_head.ReportHead'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.serve.serve_head.ServeHead'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.snapshot.snapshot_head.APIHead'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.tune.tune_head.TuneController'>
2022-06-08 18:55:43,766 INFO head.py:184 -- Loading DashboardHeadModule: <class 'ray.dashboard.modules.usage_stats.usage_stats_head.UsageStatsHead'>
2022-06-08 18:55:43,876 INFO head.py:188 -- Loaded 10 modules.
2022-06-08 18:55:43,938 INFO http_server_head.py:61 -- Setup static dir for dashboard: /mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/client/build
2022-06-08 18:55:43,971 INFO http_server_head.py:132 -- Dashboard head http address: 127.0.0.1:8265
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /logical/actor_groups> -> <function ActorHead.get_actor_groups at 0x7faf7e8bd950>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /logical/actors> -> <function ActorHead.get_all_actors[cache ttl=2, max_size=128] at 0x7faf7e8bdb00>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /logical/kill_actor> -> <function ActorHead.kill_actor at 0x7faf7e8bdcb0>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /events> -> <function EventHead.get_event[cache ttl=2, max_size=128] at 0x7faf7e4faf80>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/version> -> <function JobHead.get_version at 0x7faf7e513cb0>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [GET] <DynamicResource /api/packages/{protocol}/{package_name}> -> <function JobHead.get_package at 0x7faf7e5179e0>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [PUT] <DynamicResource /api/packages/{protocol}/{package_name}> -> <function JobHead.upload_package at 0x7faf7e517b90>
2022-06-08 18:55:43,971 INFO http_server_head.py:137 -- <ResourceRoute [POST] <PlainResource /api/jobs/> -> <function JobHead.submit_job at 0x7faf7e517d40>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [POST] <DynamicResource /api/jobs/{job_id}/stop> -> <function JobHead.stop_job at 0x7faf7e517ef0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <DynamicResource /api/jobs/{job_id}> -> <function JobHead.get_job_info at 0x7faf7e51a0e0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/jobs/> -> <function JobHead.list_jobs at 0x7faf7e51a290>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <DynamicResource /api/jobs/{job_id}/logs> -> <function JobHead.get_job_logs at 0x7faf7e51a440>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <DynamicResource /api/jobs/{job_id}/logs/tail> -> <function JobHead.tail_job_logs at 0x7faf7e51a5f0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /log_index> -> <function LogHead.get_log_index at 0x7faf7e5213b0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /log_proxy> -> <function LogHead.get_log_from_proxy at 0x7faf7e5214d0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /nodes> -> <function NodeHead.get_all_nodes[cache ttl=2, max_size=128] at 0x7faf7e528950>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <DynamicResource /nodes/{node_id}> -> <function NodeHead.get_node[cache ttl=2, max_size=128] at 0x7faf7e528b90>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /memory/memory_table> -> <function NodeHead.get_memory_table at 0x7faf7e528d40>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /memory/set_fetch> -> <function NodeHead.set_fetch_memory_info at 0x7faf7e528e60>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /node_logs> -> <function NodeHead.get_logs at 0x7faf7e528f80>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /node_errors> -> <function NodeHead.get_errors at 0x7faf7e52b0e0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/launch_profiling> -> <function ReportHead.launch_profiling at 0x7faf7e2ad830>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/ray_config> -> <function ReportHead.get_ray_config at 0x7faf7e2ad950>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/cluster_status> -> <function ReportHead.get_cluster_status at 0x7faf7e2ada70>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/serve/deployments/> -> <function ServeHead.get_all_deployments at 0x7faf7e245b00>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/serve/deployments/status> -> <function ServeHead.get_all_deployment_statuses at 0x7faf7e245cb0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [DELETE] <PlainResource /api/serve/deployments/> -> <function ServeHead.delete_serve_application at 0x7faf7e245e60>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [PUT] <PlainResource /api/serve/deployments/> -> <function ServeHead.put_all_deployments at 0x7faf7e249050>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/actors/kill> -> <function APIHead.kill_actor_gcs at 0x7faf7e249ef0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /api/snapshot> -> <function APIHead.snapshot at 0x7faf7e24e050>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /tune/info> -> <function TuneController.tune_info at 0x7faf7e1fbb00>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /tune/availability> -> <function TuneController.get_availability at 0x7faf7e1fbc20>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /tune/set_experiment> -> <function TuneController.set_tune_experiment at 0x7faf7e1fbcb0>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /tune/enable_tensorboard> -> <function TuneController.enable_tensorboard at 0x7faf7e1fbe60>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <StaticResource /logs -> PosixPath('/tmp/ray/session_2022-06-08_18-55-15_686145_88374/logs')> -> <bound method StaticResource._handle of <StaticResource /logs -> PosixPath('/tmp/ray/session_2022-06-08_18-55-15_686145_88374/logs')>>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /> -> <function HttpServerDashboardHead.get_index at 0x7faf7dd73290>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <PlainResource /favicon.ico> -> <function HttpServerDashboardHead.get_favicon at 0x7faf7dd73d40>
2022-06-08 18:55:43,972 INFO http_server_head.py:137 -- <ResourceRoute [GET] <StaticResource /static -> PosixPath('/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/client/build/static')> -> <bound method StaticResource._handle of <StaticResource /static -> PosixPath('/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/client/build/static')>>
2022-06-08 18:55:43,972 INFO http_server_head.py:138 -- Registered 38 routes.
2022-06-08 18:55:43,973 INFO datacenter.py:70 -- Purge data.
2022-06-08 18:55:43,988 INFO event_utils.py:127 -- Monitor events logs modified after 1654683938.9974043 on /tmp/ray/session_2022-06-08_18-55-15_686145_88374/logs/events, the source types are ['GCS'].
2022-06-08 18:55:43,998 INFO usage_stats_head.py:89 -- Usage reporting is disabled.
2022-06-08 18:55:43,998 INFO actor_head.py:105 -- Getting all actor info from GCS.
2022-06-08 18:55:44,022 INFO actor_head.py:131 -- Received 0 actor info from GCS.
2022-06-08 18:55:54,337 ERROR node_head.py:259 -- Error updating node stats of 34dbd7afbdb49d81a76918116d8c1590dba849e7ec84627e3fa684a4.
Traceback (most recent call last):
File "/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/modules/node/node_head.py", line 254, in _update_node_stats
timeout=2,
File "/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/grpc/aio/_call.py", line 291, in __await__
self._cython_call._status)
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1654685754.241521526","description":"Error received from peer ipv4:10.140.1.18:46649","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>
2022-06-08 18:56:11,183 ERROR node_head.py:259 -- Error updating node stats of 34dbd7afbdb49d81a76918116d8c1590dba849e7ec84627e3fa684a4.
Traceback (most recent call last):
File "/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/ray/dashboard/modules/node/node_head.py", line 254, in _update_node_stats
timeout=2,
File "/mnt/lustre/zhangyuchang/.conda/envs/0524alpa/lib/python3.7/site-packages/grpc/aio/_call.py", line 291, in __await__
self._cython_call._status)
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1654685771.122127770","description":"Error received from peer ipv4:10.140.1.18:46649","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>
Rarely, the dashboard will start successfully. But most of the time the error above will occur and dashboard will not start successfully.