Worker logs are sent to multiple clients

kubav · October 4, 2021, 2:44pm

Hello, we are using ray cluster to optimize resource usage in kubernetes cluster. We have lots of “client” pods scheduling jobs to one shared cluster. Cluster is configured to autoscale as much as possible. Usually every job gets its own worker. Problem is that ray is mixing logs from all jobs together. All client pods are running the same sources but with different configuration.

i.e. client pod X sends jobs A and B to cluster and client pod Y sends jobs C and D to cluster. All run at the same time. X is receiving logs from A and B jobs but also some of the logs created by C and D jobs scheduled by client Y. Y is also receiving logs from all jobs. Those jobs are scheduled on different worker pods.

Is there any configuration option we have missed to prevent this behavior? We are running ray v 1.5.2.

ijrsvt · October 5, 2021, 4:31am

Hey @kubav how are these client pods connecting in? Are they using ray.client (i.e. the gRPC Ray Client)?

kubav · October 5, 2021, 5:28am

Client is calling “ray.init” and jobs are just one function annotated with “@ray.remote”.

ijrsvt · October 5, 2021, 2:43pm

@kubav are you passing an address in ray.init()?

kubav · October 6, 2021, 7:32am

Yes, we have already deployed about 20 client pods. Each client pod is sending hundreds of jobs a day. We are using autoscaler because jobs are not being queued the whole day (worker pod count is between 1 - 150). Everything is working quite good except logs from different jobs are mixed together.

Init looks like:

ray.init(
  address=f'ray://{address}:{port}',
  runtime_env={'working_dir': '/path/to/sources'},
  namespace=namespace
)

Sending job looks like this:

 remote_func.options(
                    name=job_name,
                    resources={'worker1': 1}, # choose worker type
                ).remote(params)

Ethan_Chu · November 30, 2021, 11:09am

Also has this mixed logging problem in Ray v1.7.0 and v1.8.0.

kubav · November 30, 2021, 11:25am

We have contacted ray developers and they said this is not a bug but a feature. I have worked around it by filtering stdout and stderr.

see How to filter stdout in python logging - Stack Overflow

Ethan_Chu · December 1, 2021, 6:29am

According to [WIP] Fix log monitor pid is reported as None by rkooo567 · Pull Request #19974 · ray-project/ray · GitHub,
Please check if the pid is None in your case.
If so, I believe it is a bug.

Ethan_Chu · December 1, 2021, 10:08am

PR submitted: [Core] Update job id to hex in JOB_LOG_PATTERN (#20612) by xychu · Pull Request #20816 · ray-project/ray · GitHub

kubav · December 7, 2021, 7:39am

You are right, PID is None. Thanks for solving this.

Topic		Replies	Views
Reading logs on worker nodes Ray Tune	4	698	March 23, 2022
Job logs deleted once worker pods are scaled down Ray Clusters	1	65	August 13, 2024
Pool in a Ray cluster is sending the same number of jobs to different nodes even though the nodes have different sizes/different number of CPUs Kubernetes	6	665	June 8, 2022
Some questions about Ray on Kubernetes Ray Clusters	3	772	December 3, 2021
Task assignment to multiple workers in autoscaler Kubernetes	4	453	May 2, 2021

Worker logs are sent to multiple clients

Related topics