How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
 
I made this extremely simple setup to illustrate the question:
from ray import tune
from ray.tune.schedulers import ASHAScheduler
n = 9
func_vals = {
    # score for a = 0 at each "reporting point" of "training"
    0: [1, 2, 3, 4, 10, 4, 3, 2, 1,],
    # score for a = 1 at each "reporting point" of "training"
    1: [1, 2, 3, 4, 3,  2, 1, 0, 1,],
}
def objective(x, a):
    # a should be 0 or 1
    if x >= 9:
        return 0
    else:
        return func_vals[a][x]
space = {
    'a': tune.choice([0,1])
}
def trainable(config):
    # config (dict): A dict of hyperparameters.
    for x in range(n):
        score = objective(x, config["a"])
        tune.report(score=score)  # This sends the score to Tune.
tune_scheduler = ASHAScheduler(
    max_t = 10,
    grace_period=5,
    reduction_factor=2,
)
analysis = tune.run(trainable,
         metric='score',
         mode = 'max',
         config=space,
         num_samples=4,
         scheduler=tune_scheduler,
)
max_score = analysis.best_result['score']
best_a = analysis.best_result['config']['a']
print(f'best score was {max_score} with a = {best_a}')
When the hyperparameter a=0, its best achieved score is 10 at reporting point 5, whereas with hyperparameter a=1, the best is 4 at reporting point 4.
This is intended to simulate behavior of a training loop where the “score” could be say val_auc which may improve up to a point and then decline.
In the above code Ray/Tune returns the best hyperparam as a=1, with max_score=3.
I was very surprised to see this. Is there an explanation as to what is the mechanism behind this?
Two things that would be helpful to understand are:
- How does early termination work?
 - How does Ray/tune evaluate the quality of a trial – is it based on the last reported score, or best so far?
 
Thanks for any clarifications.