Hey everyone,
I’m trying to save the output episodes of my policy evaluation to files (e.g. json/jsonl) so I can look at the evolution after training. Is there any way to do this? Due to storage constraints I just want the evaluation episodes, not the training episodes, otherwise I would guess AlgorithmConfig.offline_data
would do the job.
A few years ago it was possible to achieve this by nesting an output
dict into the evaluation
config dict, but I am unsure how to achieve this now, since the way the framework is configured changed drastically.
Thanks a lot in advance.
Best regards,
André
For the people that stumble on this in the future:
I was unable to find any configuration that enabled me to log the files using the offline_data
configuration to only log evaluation episodes without writing the training episodes. There seems to be an inherent problem with the tensorflow implementation and the offline data logging during evaluation.
I updated my RLLib/Ray install to ray==2.24.0
(good job on the new RLLib interfaces btw) and implement a custom evaluation function:
# Source: https://discuss.ray.io/t/saving-evaluation-episodes-to-files/14780
def custom_eval_function(algorithm, eval_workers):
# Each eval worker samples *one* episode
episodes = eval_workers.foreach_worker(func=lambda w: w.sample(), local_worker=False)
metrics = eval_workers.foreach_worker(func=lambda w: w.get_metrics(), local_worker=False)
# Calculate the metrics for logging
algorithm.metrics.merge_and_log_n_dicts(
metrics, key=(EVALUATION_RESULTS, ENV_RUNNER_RESULTS)
)
eval_results = algorithm.metrics.reduce(
key=(EVALUATION_RESULTS, ENV_RUNNER_RESULTS)
)
# Iterate the workers with some respective worker index
for wi, w in enumerate(episodes):
ioctx = IOContext(
log_dir=algorithm._logdir,
config=None,
worker_index=wi,
)
json_writer = JsonWriter(
path=algorithm._logdir,
ioctx=ioctx,
compress_columns=['obs', 'new_obs', 'infos'],
)
# Set the file index to the current training iteration
json_writer.file_index = algorithm.training_iteration
# Write all episodes that this worker gathered (should just be one for now)
for e in w:
json_writer.write(e.get_sample_batch())
return eval_results
This can be used in the configuration as follows:
# [...]
.evaluation(
# [...]
custom_evaluation_function=custom_eval_function,
# [...]
)
This way, I can now also log the evaluation episodes for my tensorflow models. Maybe this helps someone.
For me, a single episode per worker is enough, but if you extend it, feel free to post it here, so others can have it too!
Cheers,
André