Reporting progress from another process

krafczyk · May 17, 2022, 2:23am

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I’m writing an ML application in which I want to perform a hyperparameter optimization over a collection of models which may alternate between pytorch and tensorflow. Tensorflow unfortunately does not have a mechanism for freeing it’s gpu memory usage (that I know of, if there is one I’ve missed, please tell me! it would make my life much easier).

I get around this problem by doing all gpu tasks (such as model training) by launching a new process using python multiprocess. Then when this process exits, the gpu memory is properly cleaned up. See discussions here: (python - Clearing Tensorflow GPU memory after model execution - Stack Overflow, GPU resources not released when session is closed · Issue #1727 · tensorflow/tensorflow · GitHub)

Now, the issue is, I can’t use tune.report within the launched process. tune looks for a _session object internally and knows it is launched outside of the original trial processes when it doesn’t find it. This is likely because I launch the process using the ‘spawn’ method instead of ‘fork’ as I’m interested in having this method work on windows as well. (‘fork’ method is only supported on linux).

What is the proper way to do this reporting from another ‘spawned’ process? Can I pass a tune reporter as an argument to the spawned process?

matthewdeng · May 17, 2022, 5:32am

Are you performing communication between the spawned process and the process that’s launching it?

Another idea I had is to execute the process as a Ray Task! And here’s an example for how you can ensure that the GPU memory is cleared after executing the task.

krafczyk · May 17, 2022, 3:12pm

Yes, when launching the process, I create a queue which is responsible for sending results back to the primary thread.

I have considered writing another queue to send the ‘report’ results in a similar way however it’s a bit of work and requires parts of my library to be tied to ray tune which I find undesirable. (I’m interested in my library being general purpose.)

Can I wrap a trainable function with ray.remote and max_calls=1 and have tune properly execute it? I’m probably going to just give it a try and see what happens.

krafczyk · May 17, 2022, 6:23pm

@matthewdeng

I tried decorating my function trainable with @ray.remote(max_calls=1) as suggested.

I needed to pass the trainable function as trainable_function.remote as it doesn’t have a plain call method. This worked, however at the finish of the first trial, I got the error:

ValueError: Invalid return or yield value. Either return/yield a single number or a dictionary object in your trainable function.

Which I think is because the result of a task is an object on which we need to fetch the result.

matthewdeng · May 17, 2022, 6:43pm

Hey just to clarify - is the trainable_function the “primary thread” that’s launching the other processes? If so, you should be able to read from the queue in trainable_function and pass the results read from the queue to tune.report().

krafczyk · May 17, 2022, 6:55pm

Yes it’s the ‘primary thread’ which launches the other process.

Yeah, I may wind up doing that in the end, but again, it’s a lot of work to ensure that works correctly, and it will make a perpetual dependency for my library on ray.tune.

I would really prefer if there was a way to do the GPU stuff within the trainable_function, and then have ray launch a new process for the next sample from the search space. The way it works now, it seems like ray launches a single process for each ‘worker’ and those processes persist until the search finishes.

matthewdeng · May 17, 2022, 10:16pm

Hey @krafczyk, thanks for the clarification here.

I would really prefer if there was a way to do the GPU stuff within the trainable_function , and then have ray launch a new process for the next sample from the search space.

Actually, I believe this should work. Tune will create a remote Actor process for each Trial, and it will run the trainable_function within. At the end of the Trial, the process will be terminated and the GPU should be cleaned up.

The way it works now, it seems like ray launches a single process for each ‘worker’ and those processes persist until the search finishes.

Do you have a reproduction for this? This sounds possible if you’re running the GPU training directly in Ray Tasks (GPU Support — Ray 2.8.0), but should not be the case for Actors (which Tune uses).

krafczyk · May 18, 2022, 3:17pm

@matthewdeng How are ray actors spawned? Do actors share python objects?

The way I’m managing GPU utilization is with a ‘context’ object which is set at the module level. What’s happening, is I set this object within the trainable function, but for the second (and subsequent trials) the module level python object for that context is already defined, and my library thinks a context is already set. For now, I rely on only acquiring a ‘context’ if I know no other method will need a different context for the duration of the process execution. I typically enforce this by spawning a new process to run that code (which acquires the context it needs). This way when the process exits, the original process’s session doesn’t have the module level context object defined, and I can go and acquire a new context if needed.

What this implies to me, is that somehow python object states are persisting between trials. Which is why I’m asking these questions about how actors and tasks work.

matthewdeng · May 20, 2022, 8:51pm

Ah, if you directly pass this context object into the definition of the Trainable, then it will actually serialize the context object and deserialize it in the Actor process that’s running the Trainable. Mutating the context will not reflect in the Trainer’s copy.

Can you take a look at Antipattern: Accessing Global Variable in Tasks/Actors — Ray 1.12.1 and see if this describes the problem you’re facing and if the solution of using an Actor to store global state (the context) would work?

krafczyk · July 25, 2022, 3:02pm

For folks coming across this thread, I was launching ray with the local_mode=True option. This apparently changes how Ray works in a number of important ways. When I removed local_mode ray is now launching separate processes like I expect.

kai · July 26, 2022, 8:34am

Thanks for the update @krafczyk - just for completeness sake, the local_mode option is used almost exclusively for testing purposes, and even then it doesn’t implement the full Ray API. It’s not intended for any kind of actual workload. See also Starting Ray — Ray 2.8.0

This feature is maintained solely to help with debugging, so it’s possible you may encounter some issues. If you do, please file an issue.

Topic		Replies	Views
GPU Memory not clearing after one Ray tune task	2	436	September 14, 2023
GPU memory not cleared after trial Ray Tune	3	1025	January 18, 2022
TensorFlow allocates all available memory on the GPU in the first trial, leading to no space left for running additional trials in parallel Ray Tune	2	1281	September 19, 2022
GPU memory not being freed every other trial in Ray Tune	3	712	February 21, 2023
Extract GPU statistics per process in Ray Tune Ray Tune	3	367	January 26, 2021

Reporting progress from another process

Related topics