Using evaluation reward from RLLIB runs on population based tuner

carlorop · October 17, 2022, 9:05am

According to the API documentation of ray.tune, population based tunning passes the metrics as a string. I am trying to do hyper parameter tunning on a RL model that initializes the environment randomly , hence the reward metrics on the training data are not a good measure of the effect of the hyperparameters, since the dataset is large and different chunks of it could lead to different maximum rewards. I would need to compare the performance on the evaluation dataset. If PopulationBasedTraining is the trainable, how can I split the data in training and evaluation and passing the episode reward mean of the evaluation dataset as the metric for the tunner?

arturn · November 30, 2022, 7:18pm

Do you simply want to have a set seed for evaluation and a random seed for training? You can split up the dataset basically how ever you like. For offline training for example with Ray Data’s ray.data.read_csv() anddataset.train_test_split() methods. Otherwise, you evaluation config will have to have a set seed. Have a look at the docs to get some examples for what you can do here. All of this is independent from the tuning algorithm, e.g. population based training.

Topic		Replies	Views
Accessing rllib evaluation in tune.Analysis Ray Tune	5	1028	June 17, 2021
Offline data and off-policy estimation RLlib	4	698	July 20, 2022
Evaluation using other trial's model parameters Ray Tune	2	313	October 13, 2022
Question - About tune stopping condition with PBT	6	502	February 21, 2023
Questions about tune stopping condition with PBT	1	435	February 27, 2023

Using evaluation reward from RLLIB runs on population based tuner

Related topics