How to implement curriculum learning as in Narvekar and Stone (2018)

Hey @RickDW , great question :slight_smile:

For a simple curriculum setup, you can take a look at this example script here that shows how to use RLlib’s TaskSettableEnv API (you can use gym Env with this class) and a env_task_fn that picks the new “task” (curriculum).

For a more complex setup like you suggested, where one policy picks the task, and the other learns along the curriculum path, you could do:

  • Define two policies via the “multiagent” config to train a) the main policy, and b) the policy that picks the task.
  • b) would be the policy you “query” inside a custom callback (e.g. on_train_results(trainer, results) ← via the trainer object, you can get to the task-picking policy by doing trainer.get_policy([ID of task picking policy defined in "multiagent" config])).

For a hint on how to set up multiagent, see here:

For a hint on how to define your own on_train_results function, see here:

1 Like