Custom_eval without workers

How severe does this issue affect your experience of using Ray?

  • Low: It annoys or frustrates me for a moment.

Hello everyone,
I have made a custom_eval function following the example in

but I’m having some difficulty in adapting the code to my specific problem.

In practice, the only thing that I want to do in my custom_evaluation is to make 2 copies of the same environment (with deepcopy since the reset() function has randomic elements) and compare the weights of my trained model and to another static policy. Furthemore i would like to save the video of my environment . At the moment my implementation looks like this:

from envs import MyEnv
import gym
from ray.rllib.utils import try_import_torch
from copy import deepcopy
torch, nn = try_import_torch()


def CurriculumCustomEval(trainer, eval_workers):
    """custom evaluation function. In this function we execute 2 policies on the
    same copy of the environment to compare their results.
    Args:
        trainer (Trainer): trainer class to evaluate.
        eval_workers (WorkerSet): evaluation workers.
    Returns:
        metrics (dict): evaluation metrics dict.
    """
    
    metrics = {}
    # reset the env and make a clone of it
    video_dir = trainer.config["evaluation_config"]["record_env"]
    env = MyEnv()
    if video_dir:
        env = gym.wrappers.Monitor(
            env=env,
            directory=video_dir,
            video_callable=lambda x: True,
            force=True)
    obs1 = env.reset()
    obs2 = deepcopy(obs1)
    cloned_env = deepcopy(env)
    model = trainer.get_policy("default").model
    no_move_policy = {}
    for flight_id in cloned_env.flights.keys():
        # don't accellerate and don't change angle
        no_move_policy[flight_id] = 4
    done = {"__all__": False}
    counter = 0
    num_collisions = 0
    while not done["__all__"]:
        # perform step with dummy action
        rew, obs, done, info = cloned_env.step(no_move_policy)
        num_collisions += len(cloned_env.conflicts)
        counter += 1
    done = {"__all__": False}
    num_collisions2 = 0
    h = {flight_id: model.get_initial_state()
         for flight_id in env.flights.keys()}
    seq_len = torch.tensor([1.])
    actions = {flight_id: None for flight_id in env.flights.keys()}

    with torch.no_grad():
        while not done["__all__"]:
            # add both the batch and the time dim to the observation returned by the env
            for flight_id in env.flights.keys():
                for elem in obs2[flight_id].keys():
                    obs2[flight_id][elem] = torch.from_numpy(
                        obs2[flight_id][elem]).float().unsqueeze(0).unsqueeze(0)
                for elem in range(len(h[flight_id])):
                    if len(h[flight_id][elem].shape) < 2:
                        h[flight_id][elem] = h[flight_id][elem].unsqueeze(0)
            for flight_id in env.flights.keys():
                actions[flight_id], h[flight_id] = model.forward_rnn(
                    obs2[flight_id], h[flight_id], seq_len)
                actions[flight_id] = torch.argmax(actions[flight_id])
            rew, obs2, done, info = env.step(actions)
            print(actions)
            num_collisions2 += len(env.conflicts)
   env._close_video_recorder()
    #From here on I just have some metrics
   

    return metrics

As you can see i don’t really need workers for this task but i actually get an error

“ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.”

I’m using tune.run() so I have only a single workflow. There is a way to make this works?