Customise policy to only do forward/backward pass for certain observations

Hi all,

I am new to rllib and for my research project I want to create a simple PPO baseline for a discrete control problem.

I managed to get everything up and running and now I would like to customise the training process such that the forward/backward pass is only executed when certain characteristic of the observation is met. If it is not met I would like the agent to take a predetermined action (do-nothing action).

What is the best way to go about this in rllib?
Thanks for the help!