Controlling the exact number of episodes performed in a VectorEnv during a custom evaluation

loicsacre · November 10, 2022, 2:33pm

I am trying to perform a custom evaluation with a VectorEnv following the code given here:

ray-project/ray/blob/master/rllib/examples/custom_eval.py#L131-L135


      
          for i in range(5):
              print("Custom evaluation round", i)
              # Calling .sample() runs exactly one episode per worker due to how the
              # eval workers are configured.
              ray.get([w.sample.remote() for w in eval_workers.remote_workers()])

My vector env has num_envs sub cloned envs and my evaluation set is composed of X initial states of the environement. For instance, in the first episode a robot should grasp a red cube, in the second one a green cube, etc.

Let’s imagine that I have only have 1 rollout worker, num_envs=4 and X=20, so each envs should performe 20/4=5 evaluation episodes.

What would be the solution for performing exactly one episode on each sub env of a VectorEnv at once during a custom evaluation ?

w.sample.remote() works well for a classic gym environment, even if there are several rollout workers.

I have deeply investigated the RLLib codebase to understand the implementation behind the sample method of the RolloutWorker but none of the combination of the configs batch_mode and evaluation_duration_unit would result in the desired behavior.

It is possible to have a more fine grained control over the evaluation loop ?

Note: I am trying to use a VectorEnv because an instance of Isaac Sim env is very resource consuming. It is not possible to have more than two envs (one for sampling and one for evaluation) and Isaac Sim is optimized to handle several cloned envs in one main env (see some of their OmniIsaacGymEnvs examples).

kourosh · January 5, 2023, 4:40pm

Hi @loicsacre,

What you want to achieve is a deterministic way too loop through states of the env. I have a couple of solution candidates which I am gonna give a highlevel description for.

You can override the evaluation logic entirely and introduce your own evaluation loop. See this: ray/custom_eval.py at master · ray-project/ray · GitHub
You can use a callback’s on_sub_environment_created() or on_episode_start() (I am not sure exactly which of the top of my head) to set the initial state of your env. A callback can also be stateful which will help you to keep track of the states that have been used in a single process. However if you use more than 1 rolloutworker, you need to use the worker_idx to determine which worker you are on to be able to loop through states that are not visited by other workers.

Topic		Replies	Views
Using evaluation with ExternalEnv RLlib	1	225	October 5, 2021
Custom evaluation while avoiding unnecessary env creation Configure Algorithm, Training, Evaluation, Scaling	4	543	November 29, 2022
Custom_eval without workers RLlib	0	406	April 5, 2022
Sample code for custom evaluation Ray Tune	0	359	October 31, 2021
Evaluation_interval not work Ray Tune stopping condition & comparisons	2	421	November 30, 2022

Controlling the exact number of episodes performed in a VectorEnv during a custom evaluation

Related topics