Reproducibility of training Results on PPO algorithm

Mohini · September 22, 2021, 12:30pm

Hey Everyone… I am trying to use PPO algorithm (available in ray[rlllib]).
With num_workers = 4, I get reproducible results on CPU machine (my local). However on a GPU machine using num_workers = 20, it is not giving reproducible results.
Cam someone help on this ?

mannyv · September 22, 2021, 4:35pm

Hi @Mohini,

Are you using tf or torch? If torch, your issue may be related to this bug I just filed this morning: [Bug] [RLLIB] Race condition in stats_fn when using multi-gpu · Issue #18812 · ray-project/ray · GitHub

Mohini · September 23, 2021, 6:27am

Hey @mannyv,

Thanks much for the re-direction. I am using tf in my current setup. Also, I am not using multiple GPU (num_gpus = 0). It’s only the num_workers which is utilized.
num_workers = 4 (local, CPU machine, gives reproducible results).
num_workers = 4 (GPU machine, doesn’t gives reproducible results).

mannyv · September 23, 2021, 12:43pm

@Mohini OK well at least we can rule that out. Do you have a reproduction script available?

sven1977 · September 24, 2021, 10:13am

Hey @Mohini and @mannyv , very interesting topic
Actually, we were looking into the same issue, which we think might be related to this code here in rllib/utils/debug.py::update_global_seed_if_necessary(), which is used when you set the seed config key to some int value (not None).

    # Torch.
    if framework == "torch":
        torch, _ = try_import_torch()
        torch.manual_seed(seed)
        # See https://github.com/pytorch/pytorch/issues/47672.
        cuda_version = torch.version.cuda
        if cuda_version is not None and float(torch.version.cuda) >= 10.2:
            os.environ["CUBLAS_WORKSPACE_CONFIG"] = "4096:8"
        else:
            from distutils.version import LooseVersion

            if LooseVersion(torch.__version__) >= LooseVersion("1.8.0"):
                # Not all Operations support this.
                torch.use_deterministic_algorithms(True)
            else:
                torch.set_deterministic(True)
        # This is only for Convolution no problem.
        torch.backends.cudnn.deterministic = True

So in case of the GPU, we never call the torch.use_deterministic_algorithms(True). Not sure whether this is correct.

Topic		Replies	Views
[rllib] Performance of PPO with two gpus is worse than using only one gpu RLlib	1	441	January 3, 2022
Error when running on GPU RLlib	9	2273	February 23, 2022
Reproducibility of ray.tune with seeds RLlib	6	3052	July 26, 2022
GPUs not detected RLlib	7	4342	February 21, 2023
Total Workers == (Number of GPUS) - 1? Configure Algorithm, Training, Evaluation, Scaling	1	1183	February 9, 2023

Reproducibility of training Results on PPO algorithm

Related topics