Redirect worker logs to the driver

felixy12 · April 7, 2023, 9:15pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

When a task/actor is executed, the stdout/stderr is redirected to the driver as well as to worker logs. Would it be possible to directly also forward those worker logs to the driver, for example in the driver’s /tmp directory? This would allow easy access to the per-worker logs that aren’t tangled into a single pipe.

Currently, I have a context manager that pipes all stdout/stderr of my driver and parses the outputs into files based on PID. While this works, there are some limitations (the blocking .get() call must happen within the context manager). It would be great if the above were possible to avoid this.

Huaiwei_Sun · April 8, 2023, 6:24am

Not sure if I understand the question

When a task/actor is executed, the stdout/stderr is redirected to the driver as well as to worker logs. Would it be possible to directly also forward those worker logs to the driver, for example in the driver’s /tmp directory?

What’s the difference between what ray does now and “directly also forward those worker logs to the driver”? (edited)

Is it about the the prefixed pid, etc.?

felixy12 · April 9, 2023, 3:50pm

Hi, thank you for the quick response! The difference with current behavior would be to have the logs in files directly, with different workers writing to different files.

Huaiwei_Sun · April 10, 2023, 4:27am

I’m not sure if I really understand your question. Let me try to clarify a little bit.

If you are familiar with the Ray’s logging structure, Ray writes worker logs to files to the disk of the nodes where the workers (tasks and actors) are run.

Screen Shot 2023-04-09 at 9.12.56 PM1604×174 22.3 KB
At the same time, if you enable log_to_driver, worker logs are redirected to the driver output.

Does it make sense? Can you rephrase your questions?

felixy12 · April 10, 2023, 12:44pm

Yep, that makes sense. My understanding is that the current behavior is that worker logs are:

Written to log FILES on the worker nodes.
Redirected to stderr/stdout on the driver.

My question is: Is there a straightforward way to have the log FILES that are written on the worker nodes to be written to the driver? I wouldn’t just want the logs to be redirected to driver output, but to be written into files onto the driver itself. Each worker node’s logs would correspond to a different file on the driver.

Huaiwei_Sun · April 10, 2023, 4:31pm

have the log FILES that are written on the worker nodes to be written to the driver

You mean to write the worker log files to the head node? Driver is just the entrypoint script.

Each worker node’s logs would correspond to a different file on the driver.

Each worker node may have a lot of worker log files. Do you just mean sending those log files to the head node? Or do you want to merge the worker log files first into 1 log file and send it to the head node then?

felixy12 · April 11, 2023, 2:33pm

This discussion may benefit from me describing the root issue, rather than focus on the specific technical details.

The general issue I’m running into is the following: On my AWS instance, I connect to a Ray server, and I send ~20 tasks to be done remotely on the server, using ray.remote. I would like to see the outputs form those tasks, but currently run into the following blockers:

The stderr/stdout forwarded from the workers to my instance are jumbled together and difficult to parse.
There is no easy way to know which worker’s log file I should check to find the outputs.
These log files don’t persist after the task is done, as far as I can tell.

I would like to easily find the logs associated with each of my individual tasks, and for those log files to persist when the task completes.

Do you have any suggestions?

Huaiwei_Sun · April 11, 2023, 5:17pm

cc: @sangcho to follow up since he is working on the improvements to ray logging.

felixy12 · April 17, 2023, 12:41pm

Hi! Circling back on this. Any news?

sangcho · April 17, 2023, 1:38pm

Sorry. Just saw the thread.’

Btw, for one clarification; We don’t technically redirect logs to the driver. What we do is just to read a log file and stream those into the driver using RPC, and then driver just prints them again.

The stderr/stdout forwarded from the workers to my instance are jumbled together and difficult to parse.

Is it because the output is always formatted in a random way like (name, pid=XYZ)?

There is no easy way to know which worker’s log file I should check to find the outputs.

This is actually a problem we are planning to tackle in this version… For this one, I have a couple questions.

How do you plan to associate the worker logs to the job? via job_id?
Do you use any logging tools & agents? For example, something like this Log Persistence — Ray 3.0.0.dev0
Can you tell me a bit more about why just storing the driver output (which contains all worker logs technically) is not enough?

felixy12 · April 17, 2023, 2:13pm

Thanks for getting back to me! To try and address your questions,

Is it because the output is always formatted in a random way like (name, pid=XYZ)?

Sort of? I want to use the logs to track progress of my tasks. It’s hard to track progress of 20 jobs when the logs are all funneled into a single stream. Although having clearer identification of where each log comes from would help, just that alone doesn’t solve the underlying issue.
To create a dummy example, say I had a function that printed 0 to 100, taking some random amount of time in between each print. I launch 20 tasks that each run this function separately.

def print100():
    for i in range(100):
        print(i)
        time.sleep(random.random())

ray.init()
for _ in range(20):
    print100.remote()

I’d like to track the progress of each task. If the stderr/stdout of each task was re-directed to a separate file, this is easy. I can just look at each of the files and see which numbers have been printed for each. However, when all of prints are directed into a single stream, this is no longer easy to parse by eye.

For your latter series of questions:

I don’t know. Currently, calling .remote() does not return anything that can be used to associate worker logs/files with it. It would be great if that were the case.
I’m unfamiliar with the link that you sent, but can give it a look.
I think that is addressed in my example above, but let me know if that’s not the case.

Thanks!

Huaiwei_Sun · May 8, 2023, 5:53pm

@felixy12 It seems that you just want to see the logs of each of your ray tasks.
Have you tried to look at those logs via ray dashboard? The can click the log link in the dashboard.

Topic		Replies	Views
Intercepting worker stdout/stderr and loggers Ray Core	1	355	August 17, 2023
Streaming logs from launched processes by ray workers Dashboard, Monitoring & Debugging	2	685	October 21, 2022
Merge .out and .err worker log files Dashboard, Monitoring & Debugging	2	803	December 27, 2020
Redirecting All Ray Logging to a single file Ray Core	4	548	March 22, 2023
Why are worker logs not formatted using the ray logger Ray Core	1	331	August 21, 2023

Redirect worker logs to the driver

Related topics