Offline RL passing reward data from .json into environment

@kris, great that you found some examples how to proceed. As a rule of thumb, the configuration for the evaluation workers is identical to the one used in training (only in_evaluation is set to True and also the evaluation worker numbers are specific).

In regard to evaluation there had been anopther issue in this board here. Usually you need an environment to roll out the policy online. In this case SAC was suggested to be used due to its similar setup.

In the other case that you want to estimate the policy’s performance on an offline dataset, you need to provide action_logp keys in the dataset as mentioned here