Custom curriculum. The evaluation always keeps stuck at the first task.

Alee8 · April 9, 2022, 7:43pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello everyone,

I’m using a custom curriculum env as in

ray-project/ray/blob/master/rllib/examples/curriculum_learning.py

"""
Example of a curriculum learning setup using the `TaskSettableEnv` API
and the env_task_fn config.

This example shows:
  - Writing your own curriculum-capable environment using gym.Env.
  - Defining a env_task_fn that determines, whether and which new task
    the env(s) should be set to (using the TaskSettableEnv API).
  - Using Tune and RLlib to curriculum-learn this env.

You can visualize experiment results in ~/ray_results using TensorBoard.
"""
import argparse
import numpy as np
import os

import ray
from ray import tune
from ray.rllib.env.apis.task_settable_env import TaskSettableEnv, TaskType
from ray.rllib.env.env_context import EnvContext

This file has been truncated. show original

My curriculum_fn is something like this:

def curriculum_fn(
train_results: dict, task_settable_env: “TaskSettableEnv”, env_ctx: “EnvContext”) → “TaskType”:

# If all episodes in the evaluation were able to end before the max_episode_len
# we move to the next task
new_task = task_settable_env.get_task()
max_level = task_settable_env.max_level
try:
    if all(ep_len < env_ctx["max_episode_len"] for ep_len in train_results["evaluation"]["hist_stats"]["episode_lengths"]):
        if (new_task + 1) < max_level:
            new_task += 1
        print(
            f"Worker #{env_ctx.worker_index} vec-idx={env_ctx.vector_index}"
            f"\nR={train_results['episode_reward_mean']}"
            f"\nSetting env to task={new_task}"
        )
except:
      print("not in evaluation phase")
      continue

return new_task

What I wanted to do is to increase the level of the task only if during the evaluation phase, my agent was able to win before the max_length of the episode.
During the training phase, the environment works differently in the sense that it stops automatically after certain conditions (which did not occur in the evaluation phase).

The problem is that it only changes the level of the training phase, while the evaluation always keeps stuck at the first task. I have two questions in this regard:

Are the workers’ environments re-initialized after each training interaction or they are just reset? If it’s the second case, it’s strange that my evaluation_workers are always stuck at level 1.
Is there a way to set the task level as a “global variable” that is changed by the evaluation_workers but can also be retrieved by the “training workers” when the right conditions are met?

arturn · April 15, 2022, 4:43pm

Hi,

This use case is quite new to me (usually you want evaluation workers to have the exact environment context as sample collecting rollout workers, right?

Environments are spawned a couple of times, depending on the algorithm that you are running.
But obviously generally before gradients can be calculated.
When training is ongoing, environments are simply reset.
You can use the evaluation_config. Have a look at the custom_eval.py example!

Topic		Replies	Views
How to implement curriculum learning as in Narvekar and Stone (2018) RLlib	3	868	August 7, 2021
Custom Gym Environment NaN RLlib	0	287	June 16, 2023
Issue regarding custom gym environment with Ray Rlib RLlib	1	511	October 23, 2023
Issue regarding custom gym environment with Ray Rlib RLlib	0	26	October 12, 2023
Issue with Running Experiments with Custom Gym Environment RLlib	4	505	June 13, 2022

Custom curriculum. The evaluation always keeps stuck at the first task.

Related topics