How severe does this issue affect your experience of using Ray?
- Low: It annoys or frustrates me for a moment.
Hello everyone,
I have made a custom_eval function following the example in
but I’m having some difficulty in adapting the code to my specific problem.
In practice, the only thing that I want to do in my custom_evaluation is to make 2 copies of the same environment (with deepcopy since the reset() function has randomic elements) and compare the weights of my trained model and to another static policy. Furthemore i would like to save the video of my environment . At the moment my implementation looks like this:
from envs import MyEnv
import gym
from ray.rllib.utils import try_import_torch
from copy import deepcopy
torch, nn = try_import_torch()
def CurriculumCustomEval(trainer, eval_workers):
"""custom evaluation function. In this function we execute 2 policies on the
same copy of the environment to compare their results.
Args:
trainer (Trainer): trainer class to evaluate.
eval_workers (WorkerSet): evaluation workers.
Returns:
metrics (dict): evaluation metrics dict.
"""
metrics = {}
# reset the env and make a clone of it
video_dir = trainer.config["evaluation_config"]["record_env"]
env = MyEnv()
if video_dir:
env = gym.wrappers.Monitor(
env=env,
directory=video_dir,
video_callable=lambda x: True,
force=True)
obs1 = env.reset()
obs2 = deepcopy(obs1)
cloned_env = deepcopy(env)
model = trainer.get_policy("default").model
no_move_policy = {}
for flight_id in cloned_env.flights.keys():
# don't accellerate and don't change angle
no_move_policy[flight_id] = 4
done = {"__all__": False}
counter = 0
num_collisions = 0
while not done["__all__"]:
# perform step with dummy action
rew, obs, done, info = cloned_env.step(no_move_policy)
num_collisions += len(cloned_env.conflicts)
counter += 1
done = {"__all__": False}
num_collisions2 = 0
h = {flight_id: model.get_initial_state()
for flight_id in env.flights.keys()}
seq_len = torch.tensor([1.])
actions = {flight_id: None for flight_id in env.flights.keys()}
with torch.no_grad():
while not done["__all__"]:
# add both the batch and the time dim to the observation returned by the env
for flight_id in env.flights.keys():
for elem in obs2[flight_id].keys():
obs2[flight_id][elem] = torch.from_numpy(
obs2[flight_id][elem]).float().unsqueeze(0).unsqueeze(0)
for elem in range(len(h[flight_id])):
if len(h[flight_id][elem].shape) < 2:
h[flight_id][elem] = h[flight_id][elem].unsqueeze(0)
for flight_id in env.flights.keys():
actions[flight_id], h[flight_id] = model.forward_rnn(
obs2[flight_id], h[flight_id], seq_len)
actions[flight_id] = torch.argmax(actions[flight_id])
rew, obs2, done, info = env.step(actions)
print(actions)
num_collisions2 += len(env.conflicts)
env._close_video_recorder()
#From here on I just have some metrics
return metrics
As you can see i don’t really need workers for this task but i actually get an error
“ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.”
I’m using tune.run() so I have only a single workflow. There is a way to make this works?