I have a question concerning log handling/exporting when using the ray operator.
I think some pretty common use case is to export logs form a cluster and monitor it in some stack like e.g. elk.
Do you had already some thoughts on how something like this can be achieved?
As far as I know there are some tools which are e.g able to collect the logs if they are written to the console. So if inside the cluster some process like tail would run to grep the output of logs that might already be a way for achieving this. However it would need to be enabled when using the operator, right?
Hi Tanja, for concrete example, here one of my sample cluster’s configuration:
Promtail configmap
apiVersion: v1
kind: ConfigMap
metadata:
namespace: ray
name: ray-promtail-config
data:
promtail.yaml: |
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: ray
static_configs:
- labels:
job: ray
__path__: /tmp/ray/session_latest/logs/*.*
and the relevant part of RayCluster
apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
name: simon-dev-serve-cluster
spec:
headPodType: head-node
podTypes:
- name: head-node
podConfig:
apiVersion: v1
kind: Pod
metadata:
generateName: simon-cluster-ray-head-
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "8080"
spec:
restartPolicy: Never
...
containers:
- name: ray-node
imagePullPolicy: Always
image: gcr.io/loyal-oath-286321/serve_app
command: ["/bin/bash", "-c", "--"]
args: ['trap : TERM INT; sleep infinity & wait;']
ports:
- containerPort: 6379 # Redis port
- containerPort: 10001 # Used by Ray Client
- containerPort: 8265 # Used by Ray Dashboard
- containerPort: 8000 # Used by Ray Serve
volumeMounts:
...
- name: promtail
image: grafana/promtail
imagePullPolicy: Always
args:
- -config.file=/etc/promtail/promtail.yaml
volumeMounts:
- name: persistent-storage
mountPath: /tmp
readOnly: true
- name: promtail-config
mountPath: /etc/promtail
- name: worker-node
... # same promtail container
And then I just setup the loki stack with helm chart. I think you can also directly setup fluentd/fluentbit in similar way, as long as it’s getting logs from /tmp/ray/session_latest/logs.