Using evaluation reward from RLLIB runs on population based tuner

According to the API documentation of ray.tune, population based tunning passes the metrics as a string. I am trying to do hyper parameter tunning on a RL model that initializes the environment randomly , hence the reward metrics on the training data are not a good measure of the effect of the hyperparameters, since the dataset is large and different chunks of it could lead to different maximum rewards. I would need to compare the performance on the evaluation dataset. If PopulationBasedTraining is the trainable, how can I split the data in training and evaluation and passing the episode reward mean of the evaluation dataset as the metric for the tunner?

Do you simply want to have a set seed for evaluation and a random seed for training? You can split up the dataset basically how ever you like. For offline training for example with Ray Data’s ray.data.read_csv() anddataset.train_test_split() methods. Otherwise, you evaluation config will have to have a set seed. Have a look at the docs to get some examples for what you can do here. All of this is independent from the tuning algorithm, e.g. population based training.