Custom PyTorch model implementation for PPO training


Maybe someone can provide example of how can look implementation of custom cnn-lstm model in rllib for ppo training a discrete action space environment ? Or maybe some one give link for tutorials about that
Mostly interesting how can i write discrete output from lstm in forward method and value_function

Hi @overloader ,

Please check out the newest version of RLlib and have a look at the following file: