Export Logs from cluster when using Ray Operator

Hi everyone,

I have a question concerning log handling/exporting when using the ray operator.

I think some pretty common use case is to export logs form a cluster and monitor it in some stack like e.g. elk.

Do you had already some thoughts on how something like this can be achieved?

As far as I know there are some tools which are e.g able to collect the logs if they are written to the console. So if inside the cluster some process like tail would run to grep the output of logs that might already be a way for achieving this. However it would need to be enabled when using the operator, right?

Do you have any good ideas on that topic?

cc’ing @Dmitri for Ray Operator knowledge! Any thoughts about this?

Doable e.g. with tools like Loki, if you know where in the container filesystem to find logs Logging — Ray v1.6.0

It’s an open action item (and an important one) to [implement]/[document how to implement] logging stacks for Ray’s Kubernetes support.

Hi Tanja, for concrete example, here one of my sample cluster’s configuration:

Promtail configmap

  
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: ray
  name: ray-promtail-config
data:
  promtail.yaml: |
    clients:
      - url: http://loki:3100/loki/api/v1/push
    scrape_configs:
    - job_name: ray
      static_configs:
      - labels:
          job: ray
          __path__: /tmp/ray/session_latest/logs/*.*

and the relevant part of RayCluster

apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
  name: simon-dev-serve-cluster
spec:
  headPodType: head-node
  podTypes:
  - name: head-node
    podConfig:
      apiVersion: v1
      kind: Pod
      metadata:
        generateName: simon-cluster-ray-head-
        annotations:
          prometheus.io/scrape: 'true'
          prometheus.io/port: "8080"
      spec:
        restartPolicy: Never
        ...
        containers:
        - name: ray-node
          imagePullPolicy: Always
          image: gcr.io/loyal-oath-286321/serve_app
          command: ["/bin/bash", "-c", "--"]
          args: ['trap : TERM INT; sleep infinity & wait;']
          ports:
          - containerPort: 6379  # Redis port
          - containerPort: 10001  # Used by Ray Client
          - containerPort: 8265  # Used by Ray Dashboard
          - containerPort: 8000  # Used by Ray Serve

          volumeMounts:
            ...
        - name: promtail
          image: grafana/promtail
          imagePullPolicy: Always
          args:
          - -config.file=/etc/promtail/promtail.yaml
          volumeMounts:
          - name: persistent-storage
            mountPath: /tmp
            readOnly: true
          - name: promtail-config
            mountPath: /etc/promtail
  - name: worker-node
    ... # same promtail container

And then I just setup the loki stack with helm chart. I think you can also directly setup fluentd/fluentbit in similar way, as long as it’s getting logs from /tmp/ray/session_latest/logs.

There are some contributed PR that prints the monitor logs to console Make the ray logs visible on Kubernetes by holdenk · Pull Request #17810 · ray-project/ray · GitHub maybe we can adopt the same approach to stream worker logs.

1 Like