Let the agent stop the episode

TedDeVriesLentsch · May 13, 2022, 3:28pm

How severe does this issue affect your experience of using Ray?

Medium: It contributes to significant difficulty to complete my task, but I can work around it.

I have a custom environment, model, and policy for DQN. With constraints, I can determine if an action is allowed. Calculating the constraints for each possible action takes a long time and therefore I use the strategy to only check the constraints for the best action. If that action is not possible then I check the next action. The environment can stop the episode, but at the end of the episode, the situation can arise that the agent cannot do any action because of the constraints, but the environment assumes that it can still do so. Therefore I would like the agent to be able to communicate to the environment that the episode should be stopped. I have never heard of it, but is there a way to have the agent communicate to the environment that it should be stopped?

Two possible solutions:

I could check every action passed by the agent to the environment again in the environment, but that is duplicative.
I could increase my action space and let the model choose a particular action (combination) if the episode needs to be stopped.

arturn · May 16, 2022, 11:19am

I would do 2. as follows:
Since your actions can be 1-hot encoded, choose an encoding that marks the worst-case.
In the environment, end the episode with a random observation and put a flag no_possible_action in the info-dict that you return. A policy also has a postprocess_fn that you can use to postprocess the trajectory and filter bad experiences.

One more thing:
Depending on the nature of your environment, you could maybe choose a random action if none is allowed and simply always cut away the last observation of your trajectory, marking the second-last as “done”. I’m still not sure how exactly your environment works so this might not work - it’s just a thought.

Topic		Replies	Views
Query policy from within environment, without logging action? RLlib	4	204	September 27, 2022
Understanding the Stopping Process for ray.rllib.agents.dqn.DQNTrainer.train() RLlib	4	483	May 26, 2021
Explorative action or not? RLlib	1	226	April 26, 2022
Passing extra action information to the environment (DQN) RLlib	0	280	June 29, 2022
Error using compute_single_action RLlib	1	332	April 25, 2023

Let the agent stop the episode

Related Topics