Reproducibility of ray.tune with seeds

mannyv · July 13, 2022, 6:21pm

Based on your description I think this makes sense.

When you have more than one worker they are all collecting and reporting data samples at the same time in parallel. This introduces non-determinism in the ordering of samples in the training batch across seperate runs.

When you only have 1 worker then it becomes dtleterministic. When the mini-batch size is the same size as the training batch size then you are using all the data for each gradient update so across runs you are always updating with the same data and even thought the ordering of samples may differ the gradient update is deterministic since that ordering does not matter in PPO.

Another option, not currently implemented I don’t think, would be to sort the training batch samples by worker index before training.

Or retrieve samples from each worker in order. This would slow down sample throughput quite significantly if you have a lot of workers.

You would make that change here:

github.com

ray-project/ray/blob/cc7115f6a2d43dc9ad8dd7dd0ee3cc4d028a2af4/rllib/execution/rollout_ops.py#L94-L101


      
          # samples.
          if not worker_set.remote_workers():
              sample_batches = [worker_set.local_worker().sample()]
          # Loop over remote workers' `sample()` method in parallel.
          else:
              sample_batches = ray.get(
                  [worker.sample.remote() for worker in worker_set.remote_workers()]
              )

to something like:

sample_batches = [ray.get(worker.sample.remote()) for worker in worker_set.remote_workers()]

Topic		Replies	Views
PPO.train incorrect result RLlib	1	258	May 23, 2023
Reproducibility of training Results on PPO algorithm RLlib	4	470	September 24, 2021
Confusing behavior in PPO training loop (train_batch_size, sgd_minibatch_size, num_sgd_iter) RLlib	1	532	July 27, 2022
PPO configuration parameters: num_rollout_workers & train_batch_size Configure Algorithm, Training, Evaluation, Scaling	1	734	November 2, 2023
[RLlib] Ray RLlib config parameters for PPO RLlib	8	7518	April 28, 2021

Reproducibility of ray.tune with seeds

Related topics