Export Logs from cluster when using Ray Operator

TanjaBayer · August 24, 2021, 3:41pm

Hi everyone,

I have a question concerning log handling/exporting when using the ray operator.

I think some pretty common use case is to export logs form a cluster and monitor it in some stack like e.g. elk.

Do you had already some thoughts on how something like this can be achieved?

As far as I know there are some tools which are e.g able to collect the logs if they are written to the console. So if inside the cluster some process like tail would run to grep the output of logs that might already be a way for achieving this. However it would need to be enabled when using the operator, right?

Do you have any good ideas on that topic?

architkulkarni · August 24, 2021, 3:53pm

cc’ing @Dmitri for Ray Operator knowledge! Any thoughts about this?

Dmitri · August 24, 2021, 4:48pm

Doable e.g. with tools like Loki, if you know where in the container filesystem to find logs Logging — Ray v1.6.0

It’s an open action item (and an important one) to [implement]/[document how to implement] logging stacks for Ray’s Kubernetes support.

simon-mo · August 24, 2021, 7:03pm

Hi Tanja, for concrete example, here one of my sample cluster’s configuration:

Promtail configmap

  
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: ray
  name: ray-promtail-config
data:
  promtail.yaml: |
    clients:
      - url: http://loki:3100/loki/api/v1/push
    scrape_configs:
    - job_name: ray
      static_configs:
      - labels:
          job: ray
          __path__: /tmp/ray/session_latest/logs/*.*

and the relevant part of RayCluster

apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
  name: simon-dev-serve-cluster
spec:
  headPodType: head-node
  podTypes:
  - name: head-node
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        generateName: simon-cluster-ray-head-
        annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port: "8080"
      spec:
        restartPolicy: Never
        ...
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: gcr.io/loyal-oath-286321/serve_app
          command: ["/bin/bash", "-c", "--"]
          args: ['trap : TERM INT; sleep infinity & wait;']
          ports:
          - containerPort: 6379  # Redis port
          - containerPort: 10001  # Used by Ray Client
          - containerPort: 8265  # Used by Ray Dashboard
          - containerPort: 8000  # Used by Ray Serve

          volumeMounts:
            ...
        - name: promtail
          image: grafana/promtail
          imagePullPolicy: Always
          args:
          - -config.file=/etc/promtail/promtail.yaml
          volumeMounts:
          - name: persistent-storage
            mountPath: /tmp
            readOnly: true
          - name: promtail-config
            mountPath: /etc/promtail
  - name: worker-node
    ... # same promtail container

And then I just setup the loki stack with helm chart. I think you can also directly setup fluentd/fluentbit in similar way, as long as it’s getting logs from /tmp/ray/session_latest/logs.

There are some contributed PR that prints the monitor logs to console Make the ray logs visible on Kubernetes by holdenk · Pull Request #17810 · ray-project/ray · GitHub maybe we can adopt the same approach to stream worker logs.

Topic		Replies	Views
Send replica deployment logs to cloudwatch for eks pods Ray Serve	1	21	July 7, 2025
Ray Kubenetes Operator Logs Ray Clusters	2	482	June 24, 2022
ray.exceptions.WorkerCrashedError Kubernetes	9	1958	August 22, 2022
How to access my internal worker logs at one place Dashboard, Monitoring & Debugging	5	125	June 10, 2024
Ray serve logs in k8s pod Ray Serve	0	84	May 23, 2024

Export Logs from cluster when using Ray Operator

Related topics