Reproducibility of training Results on PPO algorithm

Hey Everyone… I am trying to use PPO algorithm (available in ray[rlllib]).
With num_workers = 4, I get reproducible results on CPU machine (my local). However on a GPU machine using num_workers = 20, it is not giving reproducible results.
Cam someone help on this ?

1 Like

Hi @Mohini,

Are you using tf or torch? If torch, your issue may be related to this bug I just filed this morning: [Bug] [RLLIB] Race condition in stats_fn when using multi-gpu · Issue #18812 · ray-project/ray · GitHub

1 Like

Hey @mannyv,

Thanks much for the re-direction. I am using tf in my current setup. Also, I am not using multiple GPU (num_gpus = 0). It’s only the num_workers which is utilized.
num_workers = 4 (local, CPU machine, gives reproducible results).
num_workers = 4 (GPU machine, doesn’t gives reproducible results).

@Mohini OK well at least we can rule that out. Do you have a reproduction script available?

Hey @Mohini and @mannyv , very interesting topic :slight_smile:
Actually, we were looking into the same issue, which we think might be related to this code here in rllib/utils/, which is used when you set the seed config key to some int value (not None).

    # Torch.
    if framework == "torch":
        torch, _ = try_import_torch()
        # See
        cuda_version = torch.version.cuda
        if cuda_version is not None and float(torch.version.cuda) >= 10.2:
            os.environ["CUBLAS_WORKSPACE_CONFIG"] = "4096:8"
            from distutils.version import LooseVersion

            if LooseVersion(torch.__version__) >= LooseVersion("1.8.0"):
                # Not all Operations support this.
        # This is only for Convolution no problem.
        torch.backends.cudnn.deterministic = True

So in case of the GPU, we never call the torch.use_deterministic_algorithms(True). Not sure whether this is correct.