I have finally managed to run the PPOTrainer.train() without much issues ( Thanks to @mannyv ) .
However, i see that my agent is not learning when i look into the rewards.
- Should i first use Ray.tune for hyperparamter tuning and then use PPOTrainer.train() ?
- How can i display the results from train() better ?
Thank you in advance!