I am new to rllib so maybe missing something very obvious. I am performing offline RL using an offline dataset from json files only. I wondered whether there is a more concise way to specify training over the entire dataset for x number of epochs, please? I am currently having to count the number of transitions in the dataset and multiply by the number of epochs i.e.:
for i in range(0, df_len*epochs):
eval_res = algo.train().get("evaluation")
Since train() does not take any such arguments, you can’t specify it within that syntax.
If you don’t want to manage the epochs yourself, you can use ray.tune to this.
That is the recommended way.
Tune is not exclusively meant for HP tuning, but also for managing resources of experiments and managing training runs similarly to what you are doing.
You can make it stop on epochs, rewards or frankly any metric that algo.step() returns.