Controlling the exact number of episodes performed in a VectorEnv during a custom evaluation

I am trying to perform a custom evaluation with a VectorEnv following the code given here:

My vector env has num_envs sub cloned envs and my evaluation set is composed of X initial states of the environement. For instance, in the first episode a robot should grasp a red cube, in the second one a green cube, etc.

Let’s imagine that I have only have 1 rollout worker, num_envs=4 and X=20, so each envs should performe 20/4=5 evaluation episodes.

What would be the solution for performing exactly one episode on each sub env of a VectorEnv at once during a custom evaluation ?

w.sample.remote() works well for a classic gym environment, even if there are several rollout workers.

I have deeply investigated the RLLib codebase to understand the implementation behind the sample method of the RolloutWorker but none of the combination of the configs batch_mode and evaluation_duration_unit would result in the desired behavior.

It is possible to have a more fine grained control over the evaluation loop ?

Note: I am trying to use a VectorEnv because an instance of Isaac Sim env is very resource consuming. It is not possible to have more than two envs (one for sampling and one for evaluation) and Isaac Sim is optimized to handle several cloned envs in one main env (see some of their OmniIsaacGymEnvs examples).

1 Like

Hi @loicsacre,

What you want to achieve is a deterministic way too loop through states of the env. I have a couple of solution candidates which I am gonna give a highlevel description for.

  1. You can override the evaluation logic entirely and introduce your own evaluation loop. See this: ray/ at master · ray-project/ray · GitHub

  2. You can use a callback’s on_sub_environment_created() or on_episode_start() (I am not sure exactly which of the top of my head) to set the initial state of your env. A callback can also be stateful which will help you to keep track of the states that have been used in a single process. However if you use more than 1 rolloutworker, you need to use the worker_idx to determine which worker you are on to be able to loop through states that are not visited by other workers.