Question: Using a stopper with the
stop_all method as described in the docs will stop all currently running trials and not queue any new ones, i.e., stop the experiment. How can I stop the experiment but still finish currently running trials?
Use case 1:
I monitor my tune experiments live. Sometimes this makes me realize that my tune config isn’t ideal (for example, the range of one of the hyperparameters is too limited, or I think of something else). So, I want to change my tune settings. However, I don’t want to kill/stop trials that are already running for two reasons:
- The trials might be running for more than hour, so I waste a lot of time if I just kill them
- If I kill them before they stop because of a plateau or similar TrialStopper, then the performance associated with their hyperparameters will be misleading in any analysis of the hyperparameter space. Therefore I would then have to clean up all non-completed trials manually.
Use case 2:
This feature would also allow me to device a workaround for issues with time limitations of ray workers that are submitted to a batch system (see this related question of mine).
- Perfect solution: An option on the ray dashboard or a command to connect to the ray head that triggers this kind of “soft” stopping of the experiment
Extending the stopper class: If I could write a
soft_stop_allmethod or have a
finish_trials_upon_experiment_stopparameter/attribute, I could easily build something on my own (for example trivially by checking if a certain file exists and then performing a soft stop).
n_trials: Could I change
n_trialsduring the experiment? Then I could just set it to the exact number of trials that have already run/are running to have this effect.
- Make workers not accept jobs: Another hacky way would be to set all workers somehow not to accept new jobs. Then the main tune script could not enqueue any new trials after the current ones have finished and I can safely kill and restart it.