Custom PyTorch model implementation for PPO training


Maybe someone can provide example of how can look implementation of custom cnn-lstm model in rllib for ppo training a discrete action space environment ? Or maybe some one give link for tutorials about that
Mostly interesting how can i write discrete output from lstm in forward method and value_function