How to use learner_stats to tune the environment

Are there any resources that can help me tune hyperparameters for my custom environment? E.g. how to use the data reported in learner_stats in order to find better ranges for the hyperparameters? Or at least descriptions of the reported statistic?

Hey @akhodakivskiy , difficult to say. From my own experience, it certainly helps to

  • check the entropy of your produced action distributions (make sure it’s not too high and not too low). Adjust via entropy_coeff in your config (if your algo supports this key).
  • check the explained variance of your value function output. Make sure it’s as close to 1.0 as possible. It may help to try a separate value function model (set config.model.vf_share_layers=False)
  • try different lr (learning rates) and train_batch_size!
  • try a better Ray tune algorithm, like PBT.

To hijack this comment a bit - is there any documentation that gives a proper (in depth) explanation of every statistic reported by tune?