Custom curriculum. The evaluation always keeps stuck at the first task.

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

Hello everyone,

I’m using a custom curriculum env as in

My curriculum_fn is something like this:

def curriculum_fn(
train_results: dict, task_settable_env: “TaskSettableEnv”, env_ctx: “EnvContext”) → “TaskType”:

# If all episodes in the evaluation were able to end before the max_episode_len
# we move to the next task
new_task = task_settable_env.get_task()
max_level = task_settable_env.max_level
try:
    if all(ep_len < env_ctx["max_episode_len"] for ep_len in train_results["evaluation"]["hist_stats"]["episode_lengths"]):
        if (new_task + 1) < max_level:
            new_task += 1
        print(
            f"Worker #{env_ctx.worker_index} vec-idx={env_ctx.vector_index}"
            f"\nR={train_results['episode_reward_mean']}"
            f"\nSetting env to task={new_task}"
        )
except:
      print("not in evaluation phase")
      continue

return new_task

What I wanted to do is to increase the level of the task only if during the evaluation phase, my agent was able to win before the max_length of the episode.
During the training phase, the environment works differently in the sense that it stops automatically after certain conditions (which did not occur in the evaluation phase).

The problem is that it only changes the level of the training phase, while the evaluation always keeps stuck at the first task. I have two questions in this regard:

  1. Are the workers’ environments re-initialized after each training interaction or they are just reset? If it’s the second case, it’s strange that my evaluation_workers are always stuck at level 1.

  2. Is there a way to set the task level as a “global variable” that is changed by the evaluation_workers but can also be retrieved by the “training workers” when the right conditions are met?

Hi,

This use case is quite new to me (usually you want evaluation workers to have the exact environment context as sample collecting rollout workers, right?

  1. Environments are spawned a couple of times, depending on the algorithm that you are running.
    But obviously generally before gradients can be calculated.
    When training is ongoing, environments are simply reset.
  2. You can use the evaluation_config. Have a look at the custom_eval.py example!