How to implement curriculum learning as in Narvekar and Stone (2018)

Hi all, I’ve been reading up on curriculum learning and I want to implement the algorithm described in Narvekar and Stone’s 2018 paper “Learning Curriculum Policies for Reinforcement Learning”. The basic idea is that the environment changes over time, e.g. every N episodes. The choice of environment is made by a separate RL agent, the curriculum agent. This curriculum agent’s observation is the current policy of the learning agent, and it’s actions determine which environment the learning agent is trained on.

To implement this, there need to be two separate RL training processes that run simultaneously. There are many different ways to implement this, but I am not sure what the best approach would be when using rllib. Does anyone here know of a good approach that I can take?

1 Like

Hey @RickDW , great question :slight_smile:

For a simple curriculum setup, you can take a look at this example script here that shows how to use RLlib’s TaskSettableEnv API (you can use gym Env with this class) and a env_task_fn that picks the new “task” (curriculum).

For a more complex setup like you suggested, where one policy picks the task, and the other learns along the curriculum path, you could do:

  • Define two policies via the “multiagent” config to train a) the main policy, and b) the policy that picks the task.
  • b) would be the policy you “query” inside a custom callback (e.g. on_train_results(trainer, results) ← via the trainer object, you can get to the task-picking policy by doing trainer.get_policy([ID of task picking policy defined in "multiagent" config])).

For a hint on how to set up multiagent, see here:

For a hint on how to define your own on_train_results function, see here:

1 Like

Hey Sven, thank you so much for your help, I’m pretty sure this is all I need to implement it.

1 Like

@sven1977 I have a quick follow up question about resetting a policy. To implement the curriculum learning algorithm using the multiagent API, I would sometimes have to reset the policy of the learning agent (so not the curriculum agent), so that the curriculum agent can start a new episode. I’ve been digging around in the documentation and rllib’s source code and I haven’t been able to find a reset-policy method. As far as I know, the solution to this is to either create a new policy object, or to give the policy object a new model. With the size and complexity of rllib I’m worried that I’ve missed some subtle detail, so I’m wondering whether there’s anything else that needs to happen when I reset a policy this way.