I have a manual grid search, which within every loop, the code appends rmse, mae and mape to a file along with the combination of hyperparameters that the model ran with. It runs all the combinations of hyper-parameters, and then after the loop finishes all trials I have a function that combines 3 different losses (first it filters all runs by the top 10 least rmse, then filter to the 4 least mae and then take the least mape). After that, I get the combination of hyper-parameters, and those are the parameters it uses to run my model deployed.
Here is the code of the the manual.
lr_list = [.013, .0147, .017, .019]
hs_list = [16, 32]
BATCH_SIZE = 128
OUTPUT_SIZE = 7
for quantile in quantile_list:
QUANTILES = quantile
LOSS = QuantileLoss(quantiles = QUANTILES)
for lr in lr_list:
LEARNING_RATE = lr
for hs in hs_list:
HIDDEN_SIZE = hs
run_model(data)
I have code to use Ray-tune (2.8.0) as a wrapper and it runs all the combinations of the manual loop on top, and it references the same run_model function:
import ray
from pytorch_forecasting import TimeSeriesDataSet
from ray.tune.error import TuneError
import lightning.pytorch as pl
BATCH_SIZE=128 #128
OUTPUT_SIZE = 7
pl.seed_everything(42)
def run_functions(config):
try:
trial_id = ray.train.get_context().get_trial_id()
run_model_20(data_id)
ray.train.report({'dummy_metric': 1})
except Exception as e:
print(f"An error occurred during the run: {e}")
raise e
try:
ray.init(address='auto', _node_ip_address=node_ip_address,log_to_driver=False)
except ConnectionError as e:
print(f"Could not connect to Ray cluster: {e}")
exit(1)
data_id = ray.put(data)
hyperparameter_space = {
"learning_rate": tune.grid_search([.013, .0147, .017, .019]),
"hidden_size": tune.grid_search([16, 32]),
"quantiles": tune.grid_search([[.1,.1,.3,.4,.6,.7,.8], [.1,.5,.7,.8,.9,.09,.9], [.5,.6,.7,.8,.9,.9,.9], [.1,.2,.3,.4,.5,.6,.7]]),
}
# Resources per trial
num_cpus = ray.available_resources().get('CPU', 1)
num_gpus = ray.available_resources().get('GPU', 0)
resources_per_trial = {"cpu":1, "gpu": num_gpus}
# Setup result directory
ray_results_dir = os.path.abspath("./ray_results")
os.makedirs(ray_results_dir, exist_ok=True)
try:
# Run the experiment
analysis = tune.run(
run_functions,
config=hyperparameter_space,
num_samples=1,
resources_per_trial=resources_per_trial,
# scheduler=scheduler,
local_dir=ray_results_dir,
verbose=1 # Increased verbosity
)
trial_id= ray.train.get_context().get_trial_id()
#best_config = analysis.get_best_config("dummy_metric", "max")
#print("Best config: ", best_config)
except TuneError as e:
print(f"Trial did not complete: {e}")
finally:
ray.shutdown()
I am using the same parameters and the same random seed and the same funciton. However, the results of the losses are not the same and I am getting different parameters, therefore my model results are not matching.
I would imagine logically it’s a grid search, and it’s using the same function it should give me the same results. Why am I receiving different best parameters.