Reproducible training - setting seeds for all workers / environments

Hi!
How can I best set the seed of my environments while training an RL agent?
I found the following answer on stack overflow Obtaining different set of configs across multiple calls in ray tune - Stack Overflow and here python 3.x - How do I make ray.tune.run reproducible? - Stack Overflow.
From which I understood that…

Function-level API cannot be made reproducible (ray v1.1.0, may be subject to change).

That’s why it seems I need to use the Class API for Trainanble instead of the function API. But I am not sure how to transform my current function into a tune.Trainable sub class :

def my_train_fn(config, reporter):

agent = TD3Trainer(config=config, env="guidance-v0")

for i in range(NUM_EPISODES):
    result = agent.train()
    if i % 1 == 0:
        checkpoint = agent.save(checkpoint_dir="./data/checkpoints")
        # print(pretty_print(result))
        print("checkpoint saved at", checkpoint)
agent.stop()


tune.run(my_train_fn, resources_per_trial=resources, config=config)

Any ideas?

Thanks

You could do this via the env maker, like so:

import gym

class MyEnvClass(gym.Env):
    def __init__(self, config):
        self.property1 = config.get("property1")  # <- use some keys from the config
        worker_idx = config.worker_index  # <- but you can also use the worker index
        num_workers = config.num_workers  # <- or the total number of workers
        vector_env_index = config.vector_index  <- or the vector env index
        ...
        # to set the seed
        self.seed(worker_idx + num_workers + vector_env_index)

from ray.tune import register_env

register_env("my_seeded_env", lambda config: MyEnvClass(config))

Thanks for raisin this @Lauritowal . I’ll add this to the custom_env example script so we have an example to point to in the future! :slight_smile:

1 Like

Hi!
I tried it with my custom env, however the config EnvContext is always an empy: { } in my environment…

    ray.init()

    register_env("myenv", lambda config: MyEnv(
            phase=0,
            rllib_config=config
        ))

    default_config = td3.TD3_DEFAULT_CONFIG.copy()
    custom_config = {
        "env":"myenv",
        "lr": 0.0001, 
        "num_gpus": 0,
        "framework": "torch",
        "callbacks": CustomCallbacks,
        "log_level": "WARN",
        "evaluation_interval": 20,
        "evaluation_num_episodes": 10,
        "num_workers": 3,
        "num_envs_per_worker": 3,
        "seed": 3
    }
    config = {**default_config, **custom_config}

    resources = TD3Trainer.default_resource_request(config).to_json()

    # start training
    now = datetime.datetime.now().strftime("date_%d-%m-%Y_time_%H-%M-%S")
    tune.run(my_train_fn,
             name=f"test_{now}",
             resources_per_trial=resources,
             config=config)

And in my environment I do:

rllib_seed = seed + rllib_config.worker_index + rllib_config.num_workers + rllib_config.vector_index

As I said above rllib_config is always an empty dict.

Any idea why?

Thank you
Walter

It’s empty, b/c your “env_config” key in your custom_config is not set.
Can you try doing this?

custom_config = {
        "env":"myenv",
        "env_config": { [some config for your env] },
        ...

RLlib will take the “env_config” dict and create a EnvContext object from it, which is also just a dict plus the properties: worker_index, num_workers, remote, and vector_index.

1 Like