How to implement curriculum learning as in Narvekar and Stone (2018)

sven1977 · August 3, 2021, 3:27pm

Hey @RickDW , great question

For a simple curriculum setup, you can take a look at this example script here that shows how to use RLlib’s TaskSettableEnv API (you can use gym Env with this class) and a env_task_fn that picks the new “task” (curriculum).

github.com

ray-project/ray/blob/master/rllib/examples/curriculum_learning.py

"""
Example of a curriculum learning setup using the `TaskSettableEnv` API
and the env_task_fn config.

This example shows:
  - Writing your own curriculum-capable environment using gym.Env.
  - Defining a env_task_fn that determines, whether and which new task
    the env(s) should be set to (using the TaskSettableEnv API).
  - Using Tune and RLlib to curriculum-learn this env.

You can visualize experiment results in ~/ray_results using TensorBoard.
"""
import argparse
import numpy as np
import os

import ray
from ray import tune
from ray.rllib.env.apis.task_settable_env import TaskSettableEnv, TaskType
from ray.rllib.env.env_context import EnvContext

This file has been truncated. show original

For a more complex setup like you suggested, where one policy picks the task, and the other learns along the curriculum path, you could do:

Define two policies via the “multiagent” config to train a) the main policy, and b) the policy that picks the task.
b) would be the policy you “query” inside a custom callback (e.g. on_train_results(trainer, results) ← via the trainer object, you can get to the task-picking policy by doing trainer.get_policy([ID of task picking policy defined in "multiagent" config])).

For a hint on how to set up multiagent, see here:

github.com

ray-project/ray/blob/master/rllib/examples/multi_agent_cartpole.py

"""Simple example of setting up a multi-agent policy mapping.

Control the number of agents and policies via --num-agents and --num-policies.

This works with hundreds of agents and policies, but note that initializing
many TF policies will take some time.

Also, TF evals might slow down with large numbers of policies. To debug TF
execution, set the TF_TIMELINE_DIR environment variable.
"""

import argparse
import os
import random

import ray
from ray import tune
from ray.rllib.examples.env.multi_agent import MultiAgentCartPole
from ray.rllib.examples.models.shared_weights_model import \
    SharedWeightsModel1, SharedWeightsModel2, TF2SharedWeightsModel, \

This file has been truncated. show original

For a hint on how to define your own on_train_results function, see here:

github.com

ray-project/ray/blob/master/rllib/examples/custom_metrics_and_callbacks.py

"""Example of using RLlib's debug callbacks.

Here we use callbacks to track the average CartPole pole angle magnitude as a
custom metric.
"""

from typing import Dict
import argparse
import numpy as np
import os

import ray
from ray import tune
from ray.rllib.agents.callbacks import DefaultCallbacks
from ray.rllib.env import BaseEnv
from ray.rllib.evaluation import MultiAgentEpisode, RolloutWorker
from ray.rllib.policy import Policy
from ray.rllib.policy.sample_batch import SampleBatch

parser = argparse.ArgumentParser()

This file has been truncated. show original

Topic		Replies	Views
[RLlib] varying the number of agents in multi-agent environments RLlib	3	423	June 11, 2021
Custom curriculum. The evaluation always keeps stuck at the first task. RLlib	1	255	April 15, 2022
Trying to set up external RL environment and having trouble RLlib	14	1419	September 28, 2021
How to make an agent to learn some actions more(earlier) than the others RLlib	6	248	May 29, 2022
[rllib] Modify multi agent env reward mid training RLlib	7	1298	May 27, 2021

How to implement curriculum learning as in Narvekar and Stone (2018)

Related topics