How I can generate the exactly same results in the rllib?

I want to reproduce the exact same results in my experiments with having the same learning seed for the environment and learning algorithm. I am using the PPO algorithm. I have tested my environment out of the learning cycle and I’m sure that it is totally generating the same observations and actions every time (given a ceratin seed). Therefore I am sure that the environment seeding implementation is correct. However, in the experiments, some of the training results (with exactly the same config and seed) generate similar results and some of them are not. How can I make sure I have everything to generate exact similar results.

Hi @saeid93 ,

you can use the seed hyperparameter in the Trainer configuration:

# This argument, in conjunction with worker_index, sets the random seed of
# each worker, so that identically configured trials will have identical
# results. This makes experiments reproducible.
"seed": None,

Set the seed to any positive integer value and you should get reproducible results. I would then also try to get rid of any seed setting inside of the environment.

Hope this helps