How severe does this issue affect your experience of using Ray?
- None: Just asking a question out of curiosity
Hey guys,
Are there any limitations regarding activation functions in custom models when used in combination with PPO-algo?
Is it arbitrary for all layers inside the model and limited to linear activation (“None”) for the last layer (logits resp. value-head)?