A way to show GCP logs in the dashboard?

1. Severity of the issue: (select one)
None: I’m just curious or want clarification.
Low: Annoying but doesn’t hinder my work.
Medium: Significantly affects my productivity but can find a workaround.
High: Completely blocks me.

2. Environment:

  • Ray version: v2.44.0
  • Python version: 3.11
  • OS: Ubuntu 22.04 via KubeRay deployed in GKE cluster
  • Cloud/Infrastructure: GCP + GKE
  • Other libs/tools (if relevant):

3. Repro steps / sample code: (optional, but helps a lot!)

Head node configured with

  headGroupSpec:
    rayStartParams:
      num-cpus: '0'

to prevent any tasks from being scheduled on it.

Workers are managed by the autoscaler v2, with the default pods number set to 0. All worker pods are using GKE Spot instances.

When a new job is scheduled, Ray cluster creates a new worker pod to run the job entrypoint there. This pod becomes the driver for this job and, by default, collects logs from all other pods that can execute tasks associated with this job.

However, with the autoscaler enabled, the first worker pod will be deleted as soon as it’s no longer in use, and all logs from the job will be deleted along with it. Dashboard will show that it’s not possible to load the logs anymore, which is expected, because the corresponding Ray node is also down.

There is a documentation page that suggests using fluentbit and similar tools to collect the logs from the running pods and store them in some persistent place like GCP Logging or AWS CloudWatch, but the same page also shows how to view these logs in the external viewer (Loki) and not in the dashboard.

4. What happened vs. what you expected:

  • Expected: I want to be able to see job logs after driver pod is terminated
  • Actual: I can see logs while job is running, but they disappear after that

I do understand that the current implementation will now allow for this to happen (unless the log manager is rewritten to be pluggable?), but I want to spark some discussion about this topic, as it makes the user experience unpleasant.

Can we expect for Ray to support external log storages in the future?

Hi svartalf,

Thanks for bringing it up! Yeah, currently the idea is using a 3rd party library and other tools to collect the logs and then store them for persistent log access. It might be worthwhile filing a feature request for this specific feature on GitHub too.

If you need help setting up persistent logs let me know, but the docs page you linked definitely has most of the steps you need I believe.