Not receiving the same results from a manual Grid Search

I have a manual grid search, which within every loop, the code appends rmse, mae and mape to a file along with the combination of hyperparameters that the model ran with. It runs all the combinations of hyper-parameters, and then after the loop finishes all trials I have a function that combines 3 different losses (first it filters all runs by the top 10 least rmse, then filter to the 4 least mae and then take the least mape). After that, I get the combination of hyper-parameters, and those are the parameters it uses to run my model deployed.

Here is the code of the the manual.

lr_list = [.013,  .0147, .017, .019]
hs_list = [16, 32]
for quantile in quantile_list:
    QUANTILES = quantile
    LOSS = QuantileLoss(quantiles = QUANTILES) 
    for lr in lr_list:
        LEARNING_RATE = lr
        for hs in hs_list:
            HIDDEN_SIZE = hs

I have code to use Ray-tune (2.8.0) as a wrapper and it runs all the combinations of the manual loop on top, and it references the same run_model function:

import ray
from pytorch_forecasting import TimeSeriesDataSet
from ray.tune.error import TuneError
import lightning.pytorch as pl

BATCH_SIZE=128 #128                              


def run_functions(config):
        trial_id = ray.train.get_context().get_trial_id()
        run_model_20(data_id){'dummy_metric': 1})
    except Exception as e:
        print(f"An error occurred during the run: {e}")
        raise e

    ray.init(address='auto', _node_ip_address=node_ip_address,log_to_driver=False)

except ConnectionError as e:
    print(f"Could not connect to Ray cluster: {e}")

data_id = ray.put(data)

hyperparameter_space = {
    "learning_rate": tune.grid_search([.013,  .0147, .017, .019]),
    "hidden_size": tune.grid_search([16, 32]),
    "quantiles": tune.grid_search([[.1,.1,.3,.4,.6,.7,.8],  [.1,.5,.7,.8,.9,.09,.9], [.5,.6,.7,.8,.9,.9,.9], [.1,.2,.3,.4,.5,.6,.7]]),
# Resources per trial
num_cpus = ray.available_resources().get('CPU', 1)
num_gpus = ray.available_resources().get('GPU', 0)
resources_per_trial = {"cpu":1, "gpu": num_gpus}

# Setup result directory
ray_results_dir = os.path.abspath("./ray_results")
os.makedirs(ray_results_dir, exist_ok=True)

    # Run the experiment
    analysis =
    # scheduler=scheduler,
        verbose=1  # Increased verbosity
    trial_id= ray.train.get_context().get_trial_id()
#best_config = analysis.get_best_config("dummy_metric", "max")
#print("Best config: ", best_config)
except TuneError as e:
    print(f"Trial did not complete: {e}")

I am using the same parameters and the same random seed and the same funciton. However, the results of the losses are not the same and I am getting different parameters, therefore my model results are not matching.

I would imagine logically it’s a grid search, and it’s using the same function it should give me the same results. Why am I receiving different best parameters.

Can you try moving this inside of run_functions?

Thank you for your response! Any help would be greatly appreciated as I’ve spent hours on this project trying to replicate. I only added the pl.seed_everything in the run_function as an after thought after it wasn’t matching. I have pl.seed_everything(42) within the run_model function, so it should really be exact to the manual grid search…
please please, if you can help!

If you need more information, please let me know and I’d be glad to provide.

Grid search itself shouldn’t introduce any difference.

To verify this, maybe one thing you can test is something like:

Test 1: Run function directly

run_functions({"learning_rate": . 013, "hidden_size": 16, "quantiles": [.1,.1,.3,.4,.6,.7,.8]})

Test 2: Use Ray Tune with a single config

hyperparameter_space = {
    "learning_rate": tune.grid_search([.013]),
    "hidden_size": tune.grid_search([16]),
    "quantiles": tune.grid_search([[.1,.1,.3,.4,.6,.7,.8]]),

This should help narrow down whether or not the results are the same, and/or if the random seeding is properly taking place.

Good idea - let me try it and will let you know what happens.

When I run on just 1 set of parameters I get the same exact scores - It seems it just when it’s running the multiple ones.
What can be done to achieve the same results but using Ray

Any ideas would be appreciated

Maybe you can try sharing a minimal repro? It’s hard for me to say, since running multiple sets of parameters shouldn’t modify each individual run.