How severe does this issue affect your experience of using Ray?
- High: It blocks me to complete my task.
Hi! I am trying to use ray.tune
to do some hyperparameter optimisation but I am not able to run a super simple model cause every time I receive the same error: agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See
dashboard_agent.log for the root cause.
.
But if I go to the specified file, I don’t see any error there or something that could indicate me how to proceed to solve the issue. The content of the file is the following (I’m not gonna copy entirely here but the last lines):
2023-01-24 21:56:36,091 INFO agent.py:160 – Loading DashboardAgentModule: <class ‘ray.dashboard.modules.runtime_env.runtime_env_agent.RuntimeEnvAgent’>
2023-01-24 21:56:36,092 INFO agent.py:160 – Loading DashboardAgentModule: <class ‘ray.dashboard.modules.serve.serve_agent.ServeAgent’>
2023-01-24 21:56:36,093 INFO agent.py:165 – Loaded 8 modules.
2023-01-24 21:56:36,099 INFO http_server_agent.py:74 – Dashboard agent http address: 0.0.0.0:52365
2023-01-24 21:56:36,099 INFO http_server_agent.py:81 – <ResourceRoute [GET] <PlainResource /api/local_raylet_healthz> → <function HealthzAgent.health_check at 0x7f4966b417e0>
2023-01-24 21:56:36,099 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/local_raylet_healthz> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,099 INFO http_server_agent.py:81 – <ResourceRoute [POST] <PlainResource /api/job_agent/jobs/> → <function JobAgent.submit_job at 0x7f4966b68dc0>
2023-01-24 21:56:36,099 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/job_agent/jobs/> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,099 INFO http_server_agent.py:81 – <ResourceRoute [POST] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/stop> → <function JobAgent.stop_job at 0x7f4966b68f70>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/stop> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [DELETE] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}> → <function JobAgent.delete_job at 0x7f4966b69120>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [GET] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/logs> → <function JobAgent.get_job_logs at 0x7f4966b692d0>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/logs> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [GET] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/logs/tail> → <function JobAgent.tail_job_logs at 0x7f4966b69480>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <DynamicResource /api/job_agent/jobs/{job_or_submission_id}/logs/tail> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [GET] <PlainResource /api/ray/version> → <function ServeAgent.get_version at 0x7f49668e2950>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/ray/version> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [GET] <PlainResource /api/serve/deployments/> → <function ServeAgent.get_all_deployments at 0x7f49668e29e0>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,100 INFO http_server_agent.py:81 – <ResourceRoute [GET] <PlainResource /api/serve/deployments/status> → <function ServeAgent.get_all_deployment_statuses at 0x7f49668e2b90>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/status> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [DELETE] <PlainResource /api/serve/deployments/> → <function ServeAgent.delete_serve_application at 0x7f49668e2d40>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [PUT] <PlainResource /api/serve/deployments/> → <function ServeAgent.put_all_deployments at 0x7f49668e2ef0>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <PlainResource /api/serve/deployments/> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [GET] <StaticResource /logs → PosixPath(‘/tmp/ray/session_2023-01-24_21-56-32_050369_270787/logs’)> → <bound method StaticResource._handle of <StaticResource /logs → PosixPath(‘/tmp/ray/session_2023-01-24_21-56-32_050369_270787/logs’)>>
2023-01-24 21:56:36,101 INFO http_server_agent.py:81 – <ResourceRoute [OPTIONS] <StaticResource /logs → PosixPath(‘/tmp/ray/session_2023-01-24_21-56-32_050369_270787/logs’)> → <bound method _PreflightHandler._preflight_handler of <aiohttp_cors.cors_config._CorsConfigImpl object at 0x7f496693c640>>
2023-01-24 21:56:36,101 INFO http_server_agent.py:82 – Registered 23 routes.
2023-01-24 21:56:36,110 INFO event_agent.py:56 – Report events to 10.216.0.171:45473
2023-01-24 21:56:36,111 INFO event_utils.py:131 – Monitor events logs modified after 1674591995.9445176 on /tmp/ray/session_2023-01-24_21-56-32_050369_270787/logs/events, the source types are all.
Does anyone have an idea of what could be going on here? I’m completely blocked.
Thanks!