How to run multiple trainers?

KukJin_Kim · August 24, 2022, 6:12am

High: It blocks me to complete my task.

Hi. I want to train the agent in multiple environments simultaneously. For example, I want to train the 8 ppo agents in 8 environments. (same architecture, hyperparameters)
So, I tried to implement the distributed training function with ray and RLlib trainer together. But when I ran this code, each agent wasn’t trained in a distributed manner. Is there any lock in the RLlib trainer? How can I train multiple trainers simultaneously?
Here are my codes.


import gym
import ray
from ray.tune.registry import register_env
from ray.rllib.agents.ppo import PPOTrainer, PPOConfig
from ray.tune.logger import pretty_print


env_names = ['CartPole-v0', 'MountainCar-v0', "Taxi-v3", "SpaceInvaders-v0",
            'LunarLander-v2', 'Humanoid-v2', 'FrozenLake-v0', 'HandManipulateBlock-v0']

def env_creator(env_config):
    env_name = env_config["env"]
    SEED = env_config["seed"]
    env = gym.make(env_name)
    env.seed(SEED)
    return env

for env_name in env_names:
    register_env(env_name, env_creator)

@ray.remote(num_cpus=4, num_gpus=1)
def distributed_trainer(env_name):
    config = PPOConfig()
    config.training(
            gamma=0.99,
            lr=0.0005,
            train_batch_size=1000,
            model={
                    "fcnet_hiddens": [128, 128],
                    "fcnet_activation": "tanh",
                    },
            use_gae=True,
            lambda_=0.95,
            vf_loss_coeff=0.2, 
            entropy_coeff=0.001,
            num_sgd_iter=5,
            sgd_minibatch_size=32,
            shuffle_sequences=True,
            )\
        .resources(
            num_gpus=1,
            num_cpus_per_worker=2,
                    )\
        .framework(
            framework='torch'
        )\
        .environment(
            env=env_name,
            render_env=True,
            env_config = {"env": env_name, "seed": 1}
        )\
        .rollouts(
            num_rollout_workers=2,
            num_envs_per_worker=2,
            create_env_on_local_worker=False,
            rollout_fragment_length=250,
            horizon=500,
            soft_horizon=False,
            no_done_at_end=False,
        )\
        .evaluation(
            evaluation_interval=10,
            evaluation_duration=100,
            evaluation_duration_unit='auto',
            evaluation_num_workers=3,
            evaluation_parallel_to_training=True
            #evaluation_config=,
            #custom_evaluation_function=,
        )
    print(env_name)
    trainer = PPOTrainer(env=env_name, config=config)
    for epoch in range(500):
        result = trainer.train()
        #print(pretty_print(result))
        print(f"env: {env_name}, epoch: {epoch}")
        if epoch % 10 == 0:
            checkpoint = trainer.save()
            print("checkpoint saved at", checkpoint)
    
    return 0

distributed_trainier_refs = [distributed_trainer.remote(env_name) for env_name in env_names]
results = ray.get(distributed_trainier_refs)

mannyv · August 26, 2022, 1:42pm

Hi @KukJin_Kim,

Welcome to the forums.

I am not completely clear on what you are trying to do. Are you trying to train the same policy on 8 different environments at the same time. Or are you trying to train 8 policies each on a different environment.

mannyv · August 26, 2022, 1:56pm

If it is the first one then I think you have two options.

You could create 8 workers and have each one create on a separate environment
You could create N workers and then have each worker create a VectorEnv of 8 each with a different environment. You would do this with the num_envs_per_worker config option.

Which ever way you go you would then create an env_creator Environments — Ray 2.0.0 that uses the worker_index or vector_index property of the EnvContext to choose which environment to create.

Keep in mind that for this to work you will have to have the exact some observation and reward spaces for each environment.

github.com

ray-project/ray/blob/c82f6c62c83171d946ba26b72914a3e4eb4baaa6/rllib/env/env_context.py

import copy
from typing import Optional

from ray.rllib.utils.annotations import PublicAPI
from ray.rllib.utils.typing import EnvConfigDict


@PublicAPI
class EnvContext(dict):
    """Wraps env configurations to include extra rllib metadata.

    These attributes can be used to parameterize environments per process.
    For example, one might use `worker_index` to control which data file an
    environment reads in on initialization.

    RLlib auto-sets these attributes when constructing registered envs.
    """

    def __init__(
        self,

This file has been truncated. show original

Topic		Replies	Views
How can I train multiple 'trainer' in same environment?(or embed trained trainer in environment?) RLlib	3	489	January 9, 2023
Training on multiple environment Offline RL	2	897	February 14, 2023
Issue with multiple environments training one PPO policy RLlib	0	19	May 25, 2025
Multi-Agent Training with Different Algorithms RLlib	24	3468	October 11, 2022
How to have multiple Trainers remotely train simultaneously? RLlib	0	504	March 12, 2021

How to run multiple trainers?

Related topics