Error while loading job driver logs in dashboard

  • Low: It annoys or frustrates me for a moment.

I have upgrade ray version from 2.3.0 to 2.6.0 recently and found that there is no option in dashboard to see the job specific logs.

In ray 2.3.0 there was a option named “logs” at marked position in above image but it’s not there in new version. And It’s giving **Failed to Load** under Logs section.

There is an error in dashboard.log file:

2023-09-14 19:32:39,459	INFO state_head.py:421 -- Streaming logs with options: GetLogOptions(timeout=30, node_id='b677f4256cdb12b2dfbfbc055396f3aa859b4dd901d4ad65d4a08cc9', node_ip=None, media_type='file', filename='job-driver-raysubmit_XYcqmYJvTtnu4ui7.log', actor_id=None, task_id=None, attempt_number=0, pid=None, lines=50000, interval=None, suffix='out', submission_id=None)
2023-09-14 19:32:39,459	INFO log_manager.py:442 -- Resolved log file: node_id='b677f4256cdb12b2dfbfbc055396f3aa859b4dd901d4ad65d4a08cc9' filename='job-driver-raysubmit_XYcqmYJvTtnu4ui7.log' start_offset=None end_offset=None
2023-09-14 19:32:39,461	ERROR state_head.py:439 -- <AioRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Unexpected <class 'AttributeError'>: 'grpc._cython.cygrpc._ServicerContext' object has no attribute 'done'"
	debug_error_string = "{"created":"@1694700159.461056331","description":"Error received from peer ipv4:10.60.62.206:63358","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Unexpected <class 'AttributeError'>: 'grpc._cython.cygrpc._ServicerContext' object has no attribute 'done'","grpc_status":2}"
>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/state/state_head.py", line 427, in get_logs
    async for logs_in_bytes in self._log_api.stream_logs(options):
  File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/log/log_manager.py", line 125, in stream_logs
    async for streamed_log in stream:
  File "/usr/local/lib/python3.8/dist-packages/grpc/aio/_call.py", line 321, in _fetch_stream_responses
    await self._raise_for_status()
  File "/usr/local/lib/python3.8/dist-packages/grpc/aio/_call.py", line 231, in _raise_for_status
    raise _create_rpc_error(await self.initial_metadata(), await
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Unexpected <class 'AttributeError'>: 'grpc._cython.cygrpc._ServicerContext' object has no attribute 'done'"
	debug_error_string = "{"created":"@1694700159.461056331","description":"Error received from peer ipv4:10.60.62.206:63358","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Unexpected <class 'AttributeError'>: 'grpc._cython.cygrpc._ServicerContext' object has no attribute 'done'","grpc_status":2}"
>

What grpc version are you using?

It’s grpcio==1.34.1.

@cade Can you please guide on this?

this version is a bit old, but not sure if it’s the cause.

cc @Ruiyang_Wang , do you know what’s happening here?

@shyampatel how are you running your job?

@cade Thanks for your reply. Actually, I am running cluster through cluster.yaml file and submitting a job by ray JobSubmissionClient. I have another dependency of tensorflow version which requires this grpc version. But I have found in ray dependencies that grpcio>=1.32.0 is required and my current version is grpc==1.34.1.

@cade @Ruiyang_Wang Can you please help me here to find where I am going wrong?

@aguo @sangcho have you seen this error before?

To your original question about the missing option.

This button serves the same purpose.

@Huaiwei_Sun Thanks for your response.

Yes, I got to know that now there is new section for logs. But I am unable to see job driver logs on job page. Please let me know, If I can provide any further information.

Could you find the driver log file in the logs section?

Yes, it is there. As of now, I am copying Submission ID and searching it in logs section.

@Huaiwei_Sun I have upgrade my grpcio version from 1.34.0 to latest and now it’s showing driver logs on job page. But from ray documentation, it’s showing grcpio>=1.32 is required. is there any version related issue or any conflict with other dependencies?

I’m not sure. @shyampatel can you file a github issue and we can get someone to improve this?

@Huaiwei_Sun Thanks for your support. I have raised an github issue for the same. You can follow it here: [Dashboard] Failed to load job driver logs on job page · Issue #40249 · ray-project/ray · GitHub.