Terminating Action in Offline RL

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I am just starting with Ray RLlib so forgive me for any novice mistakes. I am working on building a lot of custom methods to perform offline RL. My difficulty is in how to implement certain actions. Consider a toy example, with X states, based on the current observation, the agent can decide to continue onwards to the next state or to terminate. Perhaps think along the lines of medical testing where it is important to not needlessly perform testing when it is not necessary. How can this action of “terminate” be implemented into the possible action space in an offline setting?

Can your agent choose to go to any of the states from any of the other states? If so, you could model this as an X+1 discrete action space problem, where the extra action is the terminal one.

Thank you. No, it is a sequential chain of states with two actions (move forward and terminate). Such as state 1 → state 2 → state 3. State 2 can not go back to state 1. And after state 3 it must terminate.

So a Discrete(2) action space? Basic Usage - Gymnasium Documentation