I am using the logic of computing action as described in documentation (screenshot below), for a custom evaluation of policy. Basically, I want to evaluate various policies trained on different algorithms and hyperparameters, for my domain specific metrics. I don’t want to use custom callback and evaluation during or at the end of training. I want to evaluate the policies later using compute action approach shown below.
I want to do parallel rollout of environment on multiple cores while evaluating single policy for around 50 episodes. I am passing env_config while creating the environment but still getting the exception “AttributeError: ‘dict’ object has no attribute ‘worker_index’”. I believe the env_context meta-data is not been created like it gets created while training the policy on multiple cores.
How can I use multiple cores to evaluate/rollout the policy in parallel environments using the following code snippet?