Parallel workers compute action

I am using the logic of computing action as described in documentation (screenshot below), for a custom evaluation of policy. Basically, I want to evaluate various policies trained on different algorithms and hyperparameters, for my domain specific metrics. I don’t want to use custom callback and evaluation during or at the end of training. I want to evaluate the policies later using compute action approach shown below.

I want to do parallel rollout of environment on multiple cores while evaluating single policy for around 50 episodes. I am passing env_config while creating the environment but still getting the exception “AttributeError: ‘dict’ object has no attribute ‘worker_index’”. I believe the env_context meta-data is not been created like it gets created while training the policy on multiple cores.

How can I use multiple cores to evaluate/rollout the policy in parallel environments using the following code snippet?

You can do custom evaluation with the evaluation_config: {} hyperparameter in ray.rllib.agents.trainer — Ray 0.7.1 documentation.

In addition, there are other hyperparameters such evaluation_num_episodes and evaluation_num_workers, which I think is particularly relevant in your case.

1 Like

Thanks for your reply. However, I get “This page does not exist yet” error when I click on the doc. Should this hyperparameter work without using ray tune, as I described in my question? Also, how do I pass evaluation_num_workers param? In the env_config dictionary?

I thought the evaluation_num_workers and evaluation_num_episodes are only used when we want to evaluate the policy after a batch or at the end of training process. As I mentioned, my requirement is to evaluate policy at later point in time by restoring the checkpoint manually. Will these hyperparameters work in such situation? Please share whatever insights you can.

Hey @vishalrangras , correct: evaluation_num_workers and evaluation_num_episodes is only relevant if you have evaluation_interval set to some int (not None!) and you would like evaluation to automatically happen every evaluation_interval train iterations.

However, you can still call trainer._evaluate() (trainer.evaluate()) manually if you’d like, even after restoring an agent from a checkpoint.

evaluation_num_workers goes into the main RLlib config, the one you use for everything else as well. See: for all default config options.

Thanks @sven1977 for your reply, this seems to be helpful for me. One more thing I need help with is, my environment class is designed in such a way that, for evaluation, I step into the environment one step at a time, then I call some helper methods from environment class to compute the evaluation metrics I need, add them into the data structure, and use them later to plot graphs. I am not using custom_callback_metrics feature of Ray for evaluation purposes, using it only for training related tensorboard logs.

I believe the trainer._evaluate() would work like trainer._train() and would rollout my policy for an entire episode or few episodes. I believe it won’t provide me a fine-grained control where I can call my custom methods at every step to compute my evaluation metrics.

Is there a way that I can step into the environment one step at a time using env.step() while still using multiple cores for parallel rollouts?