[Tune] [RLlib] Episodes vs iterations vs trials vs experiments

In the context of RLLib and Tune, where can I find the relationship between these concepts please?

1 Like

Hey @geekyneuro , great question and sorry for the late response due to this question being uncategorized (it’ll help us find questions much faster if you categorize them as e.g. “RLlib”).

Episode: In an RL environment, the episode starts when you call env.reset() and it finishes (after n timesteps for each of which you call env.step([some action])) when the env returns the done=True flag from the step() method.

Iteration: A single training iteration for an RLlib Trainer (calling Trainer.train() once). An iteration may contain one or more episodes (collecting data for the train batch or for a replay buffer), and one or more SGD update steps, depending on the particular Trainer being used.

Trial: When you use RLlib in combination with Tune and e.g. do a tune.grid_search over 2 learning rates, e.g. tune.grid_search([0.0001, 0.0005]), Tune will then run two “trials” using these two different learning rates.

Experiment: A (e.g. yaml) defined RLlib config (maybe containing grid_searches that cause n trials). You can store more than one experiment in a yaml file under different top-level experiment names (e.g. see ray.release.rllib_tests.learning_tests.major_algos_learning_tests.yaml).