Let the agent stop the episode

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a custom environment, model, and policy for DQN. With constraints, I can determine if an action is allowed. Calculating the constraints for each possible action takes a long time and therefore I use the strategy to only check the constraints for the best action. If that action is not possible then I check the next action. The environment can stop the episode, but at the end of the episode, the situation can arise that the agent cannot do any action because of the constraints, but the environment assumes that it can still do so. Therefore I would like the agent to be able to communicate to the environment that the episode should be stopped. I have never heard of it, but is there a way to have the agent communicate to the environment that it should be stopped?

Two possible solutions:

  1. I could check every action passed by the agent to the environment again in the environment, but that is duplicative.
  2. I could increase my action space and let the model choose a particular action (combination) if the episode needs to be stopped.
1 Like

I would do 2. as follows:
Since your actions can be 1-hot encoded, choose an encoding that marks the worst-case.
In the environment, end the episode with a random observation and put a flag no_possible_action in the info-dict that you return. A policy also has a postprocess_fn that you can use to postprocess the trajectory and filter bad experiences.

One more thing:
Depending on the nature of your environment, you could maybe choose a random action if none is allowed and simply always cut away the last observation of your trajectory, marking the second-last as “done”. I’m still not sure how exactly your environment works so this might not work - it’s just a thought.

1 Like