[RLlib] varying the number of agents in multi-agent environments

We are interested in varying the number of agents in multi-agent environments over time. Is there a way to do it without stopping the trainer and policy client ?

For example, we are registering the environment in the trainer like this:
register_env("multi_agent_cartpole", lambda _: MultiAgentCartPole({"num_agents": 4}))

Here, during the start of trainer, I specified 4 agents. Let us say, after few training steps, I want to create a new agent and delete some agent. How can we do this dynamically without stopping the trainer and worker ?

Thanks
Sai

Basically you want to do curriculum-learning with ray. I guess the simplest way is to change the environment after some number of episodes.

import ray
from ray import tune

def on_train_result(info):
    result = info["result"]
    if result["episode_reward_mean"] > 200:
        task = 2
    elif result["episode_reward_mean"] > 100:
        task = 1
    else:
        task = 0
    trainer = info["trainer"]
    trainer.workers.foreach_worker(
        lambda ev: ev.foreach_env(
            lambda env: env.set_task(task)))

ray.init()
tune.run(
    "PPO",
    config={
        "env": YourEnv,
        "callbacks": {
            "on_train_result": on_train_result,
        },
    },
)

Otherwise you cold use an environment wrapper to update the task. But this depends on the effort you want to put in your project.

Hey @Sertingolix , there is also a simple curriculum API, explained in this example script here that allows you to change your env in-flight (set it to a new “task”):

ray.rllib.examples.curriculum_learning.py

Your env will have to implement set_task and get_task and you need to specify a env_task_fn in your config (takes train results and env object so you can check whether you should set the env’s task to a new value).

Thats right. Personally I use the curriculum API too. For custom envs this works perfect. For benchmark envs I think one should wrap them for comparison reasons.