Customise policy to only do forward/backward pass for certain observations

bmanczak · November 25, 2021, 10:42am

Hi all,

I am new to rllib and for my research project I want to create a simple PPO baseline for a discrete control problem.

I managed to get everything up and running and now I would like to customise the training process such that the forward/backward pass is only executed when certain characteristic of the observation is met. If it is not met I would like the agent to take a predetermined action (do-nothing action).

What is the best way to go about this in rllib?
Thanks for the help!

felipeeeantunes · December 8, 2021, 4:12pm

Hi! I guess you can use a custom_model and add a mask in the forward function based on the observation. Similar to Action Mask as described here.

bmanczak · December 9, 2021, 9:11am

Hi!

Thanks for the reply.
That’s what I ended up doing and it works.
However, as far as I understand, this still requires forward/backward pass, causing an overhead. I tried to solve the issue by customising the compute_single_action in the PPO trainer (post) but that did not work.

mannyv · December 9, 2021, 10:36am

@bmanczak,

Couldn’t you do this in the environment in reset and step?

Rllib calls reset or step
In your environment evaluate if you need actions from the policy. If not, take predetermined actions otherwise return transiton info.

bmanczak · December 9, 2021, 11:13am

Yes, that’s also an option and can achieve the desired result.
Though I think it would be more elegant and amenable to changes to implement from the policy side.

Topic		Replies	Views
Trouble Migrating Multi-Agent PPO with Custom Model(Action Masking + CNN + MLP) to New RLlib API RLlib	7	55	July 30, 2025
Controlling compute_actions during training RLlib	0	376	November 26, 2021
Issue creating custom action mask enviorment RLlib	14	2216	October 11, 2023
Model doesn't recognize ObservationWrapper and keeps using orig_observation RLlib	4	342	October 7, 2022
Action Masking without Including "action_mask" in the Observation Space? RLlib	0	25	October 31, 2024

Customise policy to only do forward/backward pass for certain observations

Related topics