Reproducibility of ray.tune with seeds

Hello everyone,

I am running PPO on a customized MultiAgentEnv environment and have problems with reproducing training outcomes.

When running with the same config and seed, I am getting in the first training_iteration the same rewards. In the following iterations the reward is starting to differ. This is independent of how many episodes the training_iteration includes.

I tried to make it reproducible with other config parameters. If num_workers=1, the rewards match at every training_iteration. Also if sgd_minibatch_size equals the train_batch_size, the reward match at every training_iteration.

Also I tried to use seeding of the action_space, as referred to in another discussion, but it did not help.

Did someone have a similar situation and found an answer to the problem?

Based on your description I think this makes sense.

When you have more than one worker they are all collecting and reporting data samples at the same time in parallel. This introduces non-determinism in the ordering of samples in the training batch across seperate runs.

When you only have 1 worker then it becomes dtleterministic. When the mini-batch size is the same size as the training batch size then you are using all the data for each gradient update so across runs you are always updating with the same data and even thought the ordering of samples may differ the gradient update is deterministic since that ordering does not matter in PPO.

Another option, not currently implemented I don’t think, would be to sort the training batch samples by worker index before training.

Or retrieve samples from each worker in order. This would slow down sample throughput quite significantly if you have a lot of workers.

You would make that change here:

to something like:

sample_batches = [ray.get(worker.sample.remote()) for worker in worker_set.remote_workers()]
3 Likes

Ok that makes sense… Thank you for the detailed answer!

Documentation about reproducibillity:
https://docs.ray.io/en/latest/tune/faq.html#how-can-i-reproduce-experiments

@philmax @mannyv I have solved the problem with reproducibillity by making function:

def set_reproducibillity(seed=None):
    if seed is None:
        seed = 42
    tf.random.set_seed(seed)
    tf.keras.utils.set_random_seed(seed)
    tf.config.experimental.enable_op_determinism() #tested with tensorflow=2.9.1
    np.random.seed(seed)
    random.seed(seed)

This function must be placed in two places:

  1. main body
  2. trainable function
    the example code is below:
import random
import numpy as np
import ray
import tensorflow as tf
from ray import tune
from ray.tune.integration.keras import TuneReportCallback
from ray.tune.schedulers import ASHAScheduler
from tensorflow.keras.datasets import mnist


def set_reproducibillity(seed=None):
    if seed is None:
        seed = 42
    tf.random.set_seed(seed)
    tf.keras.utils.set_random_seed(seed)
    tf.config.experimental.enable_op_determinism()
    np.random.seed(seed)
    random.seed(seed)


def train_mnist(config):

    if config["reproducibility_active"]:
        set_reproducibillity()
    batch_size = config["batch"]
    num_classes = 10
    epochs = 200

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    # define model
    inputs = tf.keras.layers.Input(shape=(28, 28))
    x = tf.keras.layers.Flatten()(inputs)
    x = tf.keras.layers.LayerNormalization()(x)
    for i in range(config["layers"]):
        x = tf.keras.layers.Dense(units=config["hidden"], activation=config["activation"])(x)
        x = tf.keras.layers.Dropout(config["dropout"])(x)
    outputs = tf.keras.layers.Dense(units=num_classes, activation="softmax")(x)

    model = tf.keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
    model.compile(
        loss="sparse_categorical_crossentropy",
        optimizer=tf.keras.optimizers.Adam(learning_rate=config["learning_rate"]),
        metrics=["accuracy"])

    model.fit(
        x_train,
        y_train,
        batch_size=batch_size,
        epochs=epochs,
        verbose=0,
        validation_data=(x_test, y_test),
        callbacks=[TuneReportCallback({
            "mean_accuracy": "val_accuracy"  ##optional values ['loss', 'accuracy', 'val_loss', 'val_accuracy']
        })])


if __name__ == "__main__":

    print('Is cuda available for container:', tf.config.list_physical_devices('GPU'))
    ray.init()
    config = {
        "reproducibility_active": True,
        "learning_rate": tune.choice([1e-5, 1e-4, 1e-3, 1e-2]),
        "hidden": tune.choice([16, 32, 64, 128]),
        "dropout": tune.choice([0.01, 0.02, 0.05, 0.1, 0.2]),  # tune.uniform(0.01, 0.2)
        "activation": tune.choice(["relu", "elu"]),
        "layers": tune.choice([1, 2, 3]),
        "batch": tune.choice([4, 8, 16, 32, 64, 128]),
    }

    if config["reproducibility_active"]:
        set_reproducibillity()

    sched_asha = ASHAScheduler(time_attr="training_iteration",
                               max_t=100,
                               grace_period=10,
                               # mode='max', #find maximum, do not define here if you define in tune.run
                               reduction_factor=3,
                               # brackets=1
                               )

    analysis = tune.run(
        train_mnist,
        name="exp",
        scheduler=sched_asha,
        # Checkpoint settings
        keep_checkpoints_num=3,
        checkpoint_freq=3,
        checkpoint_at_end=True,
        # Optimalization
        metric="mean_accuracy",
        mode="max",
        stop={  # trial is finished if this value is reached
            "mean_accuracy": 0.96,
            "training_iteration": 10,
            'time_this_iter_s': 50,
            # 'timesteps_total': 1000,
            # 'episodes_total': 1000,
            # 'time_total_s': 1000,
        },
        time_budget_s=200,  # Global time budget in seconds after which all trials are stopped.
        num_samples=10,  # number of tested configurations from hyperspace
        reuse_actors=True,

        local_dir='../ray_results',  # default value is ~/ray_results
        resources_per_trial={
            "cpu": 1,
            "gpu": 0
        },
        config=config,
        verbose=3,  # values 0 to 3
    )
    print("Best hyperparameters found were: ", analysis.best_config)

3 Likes

Thanks for pointing this out, tested and it works (just setting the seed in the config, as recommended by ray did not work).

I will try to build a simple example for my custom rllib use case to try if it will help here as well.

@philmax I copied my example because many original ray examples are not clear for me. It’s nice that it works in Your example.
Now I try to make example which combine ray.tune+tf model+ASHA+optuna+mlflow+lakefs, I hope I will can.