Customise policy to only do forward/backward pass for certain observations

I am new to rllib and for my research project I want to create a simple PPO baseline for a discrete control problem.

I managed to get everything up and running and now I would like to customise the training process such that the forward/backward pass is only executed when certain characteristic of the observation is met. If it is not met I would like the agent to take a predetermined action (do-nothing action).

What is the best way to go about this in rllib?
Hi! I guess you can use a custom_model and add a mask in the forward function based on the observation. Similar to Action Mask as described here.

That’s what I ended up doing and it works.
However, as far as I understand, this still requires forward/backward pass, causing an overhead. I tried to solve the issue by customising the compute_single_action in the PPO trainer (post) but that did not work.


Couldn’t you do this in the environment in reset and step?

  1. Rllib calls reset or step
  2. In your environment evaluate if you need actions from the policy. If not, take predetermined actions otherwise return transiton info.
Yes, that’s also an option and can achieve the desired result.
Though I think it would be more elegant and amenable to changes to implement from the policy side.