Scripted Agent Support

Scripted agents are essential baselines for many classes of environments. The only example I’ve seen in RLlib is the toy rock-paper-scissors model that takes the same action every time. I’m having difficulty figuring out how to implement more complex scripted models with full access to observations but without having to convert actions into logits. Is there any documentation or support for this?


Bump – simple version: How do I submit actions instead of logits?

For prototyping I used a custom model, where I hardcoded my policy. From your question I assume a categorical action distribution. Without having to change the distribution, the sampling or other parts of your training you could create a different custom model and submit actions torch.tensor([[0,1000]],requires_grad=True) for a action of 1. For torch requires_grad=True assures, that the optimizer thinks it can optimize something. It’s avoiding the question a bit, because I think this is simpler than changing the rest of the pipeline.