For a simple curriculum setup, you can take a look at this example script here that shows how to use RLlib’s TaskSettableEnv API (you can use gym Env with this class) and a env_task_fn that picks the new “task” (curriculum).
For a more complex setup like you suggested, where one policy picks the task, and the other learns along the curriculum path, you could do:
Define two policies via the “multiagent” config to train a) the main policy, and b) the policy that picks the task.
b) would be the policy you “query” inside a custom callback (e.g. on_train_results(trainer, results) ← via the trainer object, you can get to the task-picking policy by doing trainer.get_policy([ID of task picking policy defined in "multiagent" config])).
For a hint on how to set up multiagent, see here:
For a hint on how to define your own on_train_results function, see here: