How to specify max_calls for functional API

mistycheney · March 23, 2021, 5:56am

I tried this:

tune.run( train_func, config={...} )

@ray.remote(max_calls=1)
def train():
    do_something()

def train_func(config): 
    train.remote()

but it always crashes with “SystemExit was raised from the worker”.
Can someone shed light on what could be the reason? Thanks,

rliaw · March 23, 2021, 6:45am

You shouldn’t set this for tune.run; Ray Tune doesn’t expect users to use the .remote interface.

mistycheney · March 23, 2021, 1:42pm

Then how can I deal with the problem of “GPU memory not released by previous worker” that max_calls=1 is designed to address? In my current tune experiment where each trial uses 1 GPU, half of the trials always end up with CUDA Out of Memory error. I suspect it is because these trials were started too soon, while the memory claimed by previous worker was not released.

rliaw · March 26, 2021, 6:55am

Maybe try using Training (tune.Trainable, tune.report) — Ray v2.0.0.dev0 ?

Topic		Replies	Views
Behavior of max_calls of @ray.remote by default Ray Core	1	465	June 21, 2023
Deadlock with Ray Remote Function + Tune Ray Tune	3	392	June 21, 2021
Small bug in https://github.com/ray-project/ray/blob/master/python/ray/remote_function.py Ray Core	1	303	August 20, 2021
@ray.remote(max_calls=1) executing too early	1	434	December 30, 2020
Ray tune exceeding memory -- how to set limit? Ray Tune	2	1062	December 10, 2024

How to specify max_calls for functional API

Related topics