In my team’s Ray cluster with many long running tasks/actors, the /tmp/ray/logs
directory has grown to a point that we are hitting disk utilization alerts on our EC2 instances. Now, we can always bump up the disk size, but that does not solve the real problem.
Is there a way to have a rotating file logger for the main log files? And, is there a way to have ray workers not log their work in separate files? All these worker logs are the main issue. And for debugging purposes, they are not that useful (in my experience) in comparison to raylet.err
and others.