How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I made this extremely simple setup to illustrate the question:
from ray import tune
from ray.tune.schedulers import ASHAScheduler
n = 9
func_vals = {
# score for a = 0 at each "reporting point" of "training"
0: [1, 2, 3, 4, 10, 4, 3, 2, 1,],
# score for a = 1 at each "reporting point" of "training"
1: [1, 2, 3, 4, 3, 2, 1, 0, 1,],
}
def objective(x, a):
# a should be 0 or 1
if x >= 9:
return 0
else:
return func_vals[a][x]
space = {
'a': tune.choice([0,1])
}
def trainable(config):
# config (dict): A dict of hyperparameters.
for x in range(n):
score = objective(x, config["a"])
tune.report(score=score) # This sends the score to Tune.
tune_scheduler = ASHAScheduler(
max_t = 10,
grace_period=5,
reduction_factor=2,
)
analysis = tune.run(trainable,
metric='score',
mode = 'max',
config=space,
num_samples=4,
scheduler=tune_scheduler,
)
max_score = analysis.best_result['score']
best_a = analysis.best_result['config']['a']
print(f'best score was {max_score} with a = {best_a}')
When the hyperparameter a=0
, its best achieved score is 10 at reporting point 5, whereas with hyperparameter a=1
, the best is 4 at reporting point 4.
This is intended to simulate behavior of a training loop where the “score” could be say val_auc
which may improve up to a point and then decline.
In the above code Ray/Tune returns the best hyperparam as a=1
, with max_score=3.
I was very surprised to see this. Is there an explanation as to what is the mechanism behind this?
Two things that would be helpful to understand are:
- How does early termination work?
- How does Ray/tune evaluate the quality of a trial – is it based on the last reported score, or best so far?
Thanks for any clarifications.