Controlling the exact number of episodes performed in a VectorEnv during a custom evaluation

I am trying to perform a custom evaluation with a VectorEnv following the code given here:

My vector env has num_envs sub cloned envs and my evaluation set is composed of X initial states of the environement. For instance, in the first episode a robot should grasp a red cube, in the second one a green cube, etc.

Let’s imagine that I have only have 1 rollout worker, num_envs=4 and X=20, so each envs should performe 20/4=5 evaluation episodes.

What would be the solution for performing exactly one episode on each sub env of a VectorEnv at once during a custom evaluation ?

w.sample.remote() works well for a classic gym environment, even if there are several rollout workers.

I have deeply investigated the RLLib codebase to understand the implementation behind the sample method of the RolloutWorker but none of the combination of the configs batch_mode and evaluation_duration_unit would result in the desired behavior.

It is possible to have a more fine grained control over the evaluation loop ?

Note: I am trying to use a VectorEnv because an instance of Isaac Sim env is very resource consuming. It is not possible to have more than two envs (one for sampling and one for evaluation) and Isaac Sim is optimized to handle several cloned envs in one main env (see some of their OmniIsaacGymEnvs examples).