I am new to rllib and for my research project I want to create a simple PPO baseline for a discrete control problem.
I managed to get everything up and running and now I would like to customise the training process such that the forward/backward pass is only executed when certain characteristic of the observation is met. If it is not met I would like the agent to take a predetermined action (do-nothing action).
What is the best way to go about this in rllib?
Thanks for the help!
Thanks for the reply.
That’s what I ended up doing and it works.
However, as far as I understand, this still requires forward/backward pass, causing an overhead. I tried to solve the issue by customising the compute_single_action in the PPO trainer (post) but that did not work.
Yes, that’s also an option and can achieve the desired result.
Though I think it would be more elegant and amenable to changes to implement from the policy side.