I am new to rllib and for my research project I want to create a simple PPO baseline for a discrete control problem.
I managed to get everything up and running and now I would like to customise the training process such that the forward/backward pass is only executed when certain characteristic of the observation is met. If it is not met I would like the agent to take a predetermined action (do-nothing action).
What is the best way to go about this in rllib?
Thanks for the help!