Worker logs are sent to multiple clients

Hello, we are using ray cluster to optimize resource usage in kubernetes cluster. We have lots of “client” pods scheduling jobs to one shared cluster. Cluster is configured to autoscale as much as possible. Usually every job gets its own worker. Problem is that ray is mixing logs from all jobs together. All client pods are running the same sources but with different configuration.

i.e. client pod X sends jobs A and B to cluster and client pod Y sends jobs C and D to cluster. All run at the same time. X is receiving logs from A and B jobs but also some of the logs created by C and D jobs scheduled by client Y. Y is also receiving logs from all jobs. Those jobs are scheduled on different worker pods.

Is there any configuration option we have missed to prevent this behavior? We are running ray v 1.5.2.

Hey @kubav how are these client pods connecting in? Are they using ray.client (i.e. the gRPC Ray Client)?

Client is calling “ray.init” and jobs are just one function annotated with “@ray.remote”.

@kubav are you passing an address in ray.init()?

Yes, we have already deployed about 20 client pods. Each client pod is sending hundreds of jobs a day. We are using autoscaler because jobs are not being queued the whole day (worker pod count is between 1 - 150). Everything is working quite good except logs from different jobs are mixed together.

Init looks like:

ray.init(
  address=f'ray://{address}:{port}',
  runtime_env={'working_dir': '/path/to/sources'},
  namespace=namespace
)

Sending job looks like this:

 remote_func.options(
                    name=job_name,
                    resources={'worker1': 1}, # choose worker type
                ).remote(params)

Also has this mixed logging problem in Ray v1.7.0 and v1.8.0.

We have contacted ray developers and they said this is not a bug but a feature. I have worked around it by filtering stdout and stderr.

see How to filter stdout in python logging - Stack Overflow

According to [WIP] Fix log monitor pid is reported as None by rkooo567 · Pull Request #19974 · ray-project/ray · GitHub,
Please check if the pid is None in your case.
If so, I believe it is a bug.

1 Like

PR submitted: [Core] Update job id to hex in JOB_LOG_PATTERN (#20612) by xychu · Pull Request #20816 · ray-project/ray · GitHub

1 Like

You are right, PID is None. Thanks for solving this.