How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have a custom environment, model, and policy for DQN. With constraints, I can determine if an action is allowed. Calculating the constraints for each possible action takes a long time and therefore I use the strategy to only check the constraints for the best action. If that action is not possible then I check the next action. The environment can stop the episode, but at the end of the episode, the situation can arise that the agent cannot do any action because of the constraints, but the environment assumes that it can still do so. Therefore I would like the agent to be able to communicate to the environment that the episode should be stopped. I have never heard of it, but is there a way to have the agent communicate to the environment that it should be stopped?
Two possible solutions:
- I could check every action passed by the agent to the environment again in the environment, but that is duplicative.
- I could increase my action space and let the model choose a particular action (combination) if the episode needs to be stopped.