How to use time_total_s as a stop condition?

sreaung · May 25, 2022, 9:13pm

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

Hi,

Thank you for an easy-to-use Ray Tune. I am new to Ray Tune and I am trying to use it to tune parameter values of a iterative non-learnable method. The method has lots of parameters so it will take a while to optimize all parameter values. However, the machine that I would like to run Ray Tune automatically terminates a program after it ran for 72 hours. I would like Ray Tune to terminate properly a few minutes before it hits 72 hours. I have done the following

result=tune.run(
    tune.with_parameters(partial(optimize, args, relative_data_paths)),
    name=args.tuning_exp_name,
    resources_per_trial={"cpu": args.n_cpus, "gpu": args.n_gpus},
    config=config,
    stop={'time_total_s':args.stop_time_total_h*3600},
    num_samples=args.n_samples, # number of trials
    scheduler=ASHAScheduler(),
    metric='score',
    mode=args.tuning_mode,
    fail_fast=True, # To stop the entire Tune run as soon as any trial errors
    log_to_file=True # save stdout and stderr to trial_logdir/stdout and trial_logdir/stderr
    )

where args is a command-line input. args.stop_time_total_h is in hour. To test whether the optimization stops after certain time, I tested with args.stop_time_total_h=0.05 which is 3 minutes. It seemed Ray Tune ran all the trials regardless of stop={'time_total_s':args.stop_time_total_h*3600}.

Could anyone tell whether I did something wrong?

kai · May 26, 2022, 9:27am

Hi @sreaung,

are you reporting intermediate results to Ray Tune using tune.report()?

The way the tuning loop is implemented the stopping conditions will only be considered when a new result is received. This result will contain time_total_s automatically and stop if the conditions are met. But if you don’t report anything until the very end, Tune has no information on what to act and cannot stop preemptively.

Another reason why you would want to do that is that otherwise you won’t have any results to analyze within Ray Tune or the experiment checkpoint - after all, Tune received no metrics.

As a side note, the stop condition you specified is per trial. So if a trial only started say 40 hours in, it will run for another 72 hours, which makes a total experiment runtime of 112 hours. I think you might be looking for the tune.run(time_budget_s=xxx) parameter which will stop the whole experiment after xxx seconds.

sreaung · May 26, 2022, 4:25pm

Thanks so much, Kai! Your explanation is very helpful. I did report metric values using tune.report(), but now I know why the program did not terminate. As you mentioned, I should use tune.run(time_budget_s=xxx). Thanks so much again for your kind help and for the wonderful library!

Topic		Replies	Views
Stop programmatically using running time Ray Tune	1	513	November 16, 2022
Understanding stop training_iteration parameter Ray Tune	4	889	September 24, 2021
Questions about tune stopping condition with PBT	1	435	February 27, 2023
Stop experiment, but finish currently running trials Ray Tune	7	435	February 21, 2023
Question - About tune stopping condition with PBT	6	506	February 21, 2023

How to use time_total_s as a stop condition?

Related topics