Custom_eval without workers

ColdFrenzy · April 5, 2022, 1:19pm

How severe does this issue affect your experience of using Ray?

Low: It annoys or frustrates me for a moment.

Hello everyone,
I have made a custom_eval function following the example in

ray-project/ray/blob/master/rllib/examples/custom_eval.py

"""Example of customizing evaluation with RLlib.

Pass --custom-eval to run with a custom evaluation function too.

Here we define a custom evaluation method that runs a specific sweep of env
parameters (SimpleCorridor corridor lengths).

------------------------------------------------------------------------
Sample output for `python custom_eval.py`
------------------------------------------------------------------------

INFO algorithm.py:623 -- Evaluating current policy for 10 episodes.
INFO algorithm.py:650 -- Running round 0 of parallel evaluation (2/10 episodes)
INFO algorithm.py:650 -- Running round 1 of parallel evaluation (4/10 episodes)
INFO algorithm.py:650 -- Running round 2 of parallel evaluation (6/10 episodes)
INFO algorithm.py:650 -- Running round 3 of parallel evaluation (8/10 episodes)
INFO algorithm.py:650 -- Running round 4 of parallel evaluation (10/10 episodes)

Result for PG_SimpleCorridor_2c6b27dc:
  ...

This file has been truncated. show original

but I’m having some difficulty in adapting the code to my specific problem.

In practice, the only thing that I want to do in my custom_evaluation is to make 2 copies of the same environment (with deepcopy since the reset() function has randomic elements) and compare the weights of my trained model and to another static policy. Furthemore i would like to save the video of my environment . At the moment my implementation looks like this:

from envs import MyEnv
import gym
from ray.rllib.utils import try_import_torch
from copy import deepcopy
torch, nn = try_import_torch()


def CurriculumCustomEval(trainer, eval_workers):
    """custom evaluation function. In this function we execute 2 policies on the
    same copy of the environment to compare their results.
    Args:
        trainer (Trainer): trainer class to evaluate.
        eval_workers (WorkerSet): evaluation workers.
    Returns:
        metrics (dict): evaluation metrics dict.
    """
    
    metrics = {}
    # reset the env and make a clone of it
    video_dir = trainer.config["evaluation_config"]["record_env"]
    env = MyEnv()
    if video_dir:
        env = gym.wrappers.Monitor(
            env=env,
            directory=video_dir,
            video_callable=lambda x: True,
            force=True)
    obs1 = env.reset()
    obs2 = deepcopy(obs1)
    cloned_env = deepcopy(env)
    model = trainer.get_policy("default").model
    no_move_policy = {}
    for flight_id in cloned_env.flights.keys():
        # don't accellerate and don't change angle
        no_move_policy[flight_id] = 4
    done = {"__all__": False}
    counter = 0
    num_collisions = 0
    while not done["__all__"]:
        # perform step with dummy action
        rew, obs, done, info = cloned_env.step(no_move_policy)
        num_collisions += len(cloned_env.conflicts)
        counter += 1
    done = {"__all__": False}
    num_collisions2 = 0
    h = {flight_id: model.get_initial_state()
         for flight_id in env.flights.keys()}
    seq_len = torch.tensor([1.])
    actions = {flight_id: None for flight_id in env.flights.keys()}

    with torch.no_grad():
        while not done["__all__"]:
            # add both the batch and the time dim to the observation returned by the env
            for flight_id in env.flights.keys():
                for elem in obs2[flight_id].keys():
                    obs2[flight_id][elem] = torch.from_numpy(
                        obs2[flight_id][elem]).float().unsqueeze(0).unsqueeze(0)
                for elem in range(len(h[flight_id])):
                    if len(h[flight_id][elem].shape) < 2:
                        h[flight_id][elem] = h[flight_id][elem].unsqueeze(0)
            for flight_id in env.flights.keys():
                actions[flight_id], h[flight_id] = model.forward_rnn(
                    obs2[flight_id], h[flight_id], seq_len)
                actions[flight_id] = torch.argmax(actions[flight_id])
            rew, obs2, done, info = env.step(actions)
            print(actions)
            num_collisions2 += len(env.conflicts)
   env._close_video_recorder()
    #From here on I just have some metrics
   

    return metrics

As you can see i don’t really need workers for this task but i actually get an error

“ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.”

I’m using tune.run() so I have only a single workflow. There is a way to make this works?

Topic		Replies	Views
Custom evaluation while avoiding unnecessary env creation Configure Algorithm, Training, Evaluation, Scaling	4	543	November 29, 2022
Custom_eval_function lost access to evaluation_config in Ray 2.2.0 RLlib	0	213	January 10, 2023
Ray example - custom evaluation function fails RLlib	0	26	October 14, 2024
Sample code for custom evaluation Ray Tune	0	359	October 31, 2021
Custom logging of agent behaviors RLlib	5	443	November 1, 2021

Custom_eval without workers

Related topics