Parallel workers compute action

vishalrangras · June 6, 2021, 2:30pm

I am using the logic of computing action as described in documentation (screenshot below), for a custom evaluation of policy. Basically, I want to evaluate various policies trained on different algorithms and hyperparameters, for my domain specific metrics. I don’t want to use custom callback and evaluation during or at the end of training. I want to evaluate the policies later using compute action approach shown below.

I want to do parallel rollout of environment on multiple cores while evaluating single policy for around 50 episodes. I am passing env_config while creating the environment but still getting the exception “AttributeError: ‘dict’ object has no attribute ‘worker_index’”. I believe the env_context meta-data is not been created like it gets created while training the policy on multiple cores.

How can I use multiple cores to evaluate/rollout the policy in parallel environments using the following code snippet?

michaelzhiluo · June 7, 2021, 7:59pm

You can do custom evaluation with the evaluation_config: {} hyperparameter in trainer.py: ray.rllib.agents.trainer — Ray 0.7.1 documentation.

In addition, there are other hyperparameters such evaluation_num_episodes and evaluation_num_workers, which I think is particularly relevant in your case.

vishalrangras · June 8, 2021, 4:23am

Thanks for your reply. However, I get “This page does not exist yet” error when I click on the doc. Should this hyperparameter work without using ray tune, as I described in my question? Also, how do I pass evaluation_num_workers param? In the env_config dictionary?

I thought the evaluation_num_workers and evaluation_num_episodes are only used when we want to evaluate the policy after a batch or at the end of training process. As I mentioned, my requirement is to evaluate policy at later point in time by restoring the checkpoint manually. Will these hyperparameters work in such situation? Please share whatever insights you can.

sven1977 · June 11, 2021, 10:04am

Hey @vishalrangras , correct: evaluation_num_workers and evaluation_num_episodes is only relevant if you have evaluation_interval set to some int (not None!) and you would like evaluation to automatically happen every evaluation_interval train iterations.

However, you can still call trainer._evaluate() (trainer.evaluate()) manually if you’d like, even after restoring an agent from a checkpoint.

evaluation_num_workers goes into the main RLlib config, the one you use for everything else as well. See: ray.rllib.agents.trainer.py for all default config options.

vishalrangras · June 12, 2021, 4:18pm

Thanks @sven1977 for your reply, this seems to be helpful for me. One more thing I need help with is, my environment class is designed in such a way that, for evaluation, I step into the environment one step at a time, then I call some helper methods from environment class to compute the evaluation metrics I need, add them into the data structure, and use them later to plot graphs. I am not using custom_callback_metrics feature of Ray for evaluation purposes, using it only for training related tensorboard logs.

I believe the trainer._evaluate() would work like trainer._train() and would rollout my policy for an entire episode or few episodes. I believe it won’t provide me a fine-grained control where I can call my custom methods at every step to compute my evaluation metrics.

Is there a way that I can step into the environment one step at a time using env.step() while still using multiple cores for parallel rollouts?

Topic		Replies	Views
Different Environment for training and evaluation RLlib	5	1182	July 13, 2021
How to accelerate evaluations with more evaluation workers RLlib	1	281	April 15, 2022
RLlib: using evaluation workers on previously trained models RLlib	7	2230	December 8, 2022
[RLlib] GPU performance in rollout.py RLlib	2	497	March 31, 2021
[RLlib] Questions about loading checkpoint and asynchrone evaluation workers RLlib	3	588	May 26, 2021

Parallel workers compute action

Related topics