Ray vs. Optuna Performance

I am attempting to run the same experiment in both straight Optuna, and via Ray, using the Optuna samplers within Ray.

Currently runtimes in Ray are approximately 10x as long. Ray results are also showing very little variance (sometimes none, despite ensuring a random seed) in comparison to Optuna.

The below graph shows the variation in results (y-axis is objective value we are looking to optimise) for two different setups, trialling different sampling methods.

The bulk of the code is the same for both the Optuna, and Ray-Optuna approach, with the only differences being:

  • Definition of search space, Optuna takes format :

def optuna_search_space(model_type, config):
    if model_type == 'xxx':
        search_space = {'model_type' : trial.suggest_categorical('model_type', [model_type]),
                        'a': trial.suggest_float('a', min, max)`
                        'b': trial.suggest_float('b', min, max)`
    elif model_type == 'yyy':

    return search_space
  • Whilst Ray takes the format:
def ray_search_space(model_type, config):
    if model_type == 'xxx':
        search_space = {'model_type' : 'xxx',
                        'a': tune.uniform(min, max),
                        'b': tune.uniform(min, max),
    elif model_type == 'yyy':

    return search_space

How the run is being distributed:

  • Optuna
def optuna_objective(trial, model_type, config, model):
    search_space = optuna_search_space(trial, model_type, config)
    obj = obj_calc(model, search_space)

    return obj

def optuna_run(client, model_type, config, sampler, nbr_trials, model):
    objective = partial(optuna_objective, model_type=model_type, config=config, model=model)

    study = optuna.create_study(direction='maximize', sampler=sampler, storage=DaskStorage())

    scattered_objective = client.scatter(objective)

    futures = [
        client.submit(study.optimize, scattered_objective, n_trials=1, n_jobs=nbr_trials, pure=False) 
        for _ in range(nbr_trials)

    return study.trials_dataframe()
  • Ray:
def ray_objective(search_space, model_ref):

    model = ray.get(model_ref)
    obj =  obj_calc(model,  search_space)

    return {'obj': obj} 

def ray_run(model_type, config, sampler, nbr_trials, workers_per_node, model):

    model_ref = ray.put(model)
    search_space = ray_search_space(model_type, config)

    searcher = OptunaSearch(sampler=sampler, metric='obj', mode='max')
    searcher = ConcurrencyLimiter(searcher, max_concurrent=workers_per_node) 

    tuner = tune.Tuner(partial(ray_objective, model_ref=model_ref),
    results = tuner.fit()

    return results.get_dataframe()

Is there anything in these differences that could be responsible for the very different run times and spread of results? I assumed using the same sampler for both would result in similar performance.