Is it possible to access Observations in get_exploration_action

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty to complete my task, but I can work around it.
  • High: It blocks me to complete my task.

Hello, we’ve an exploration algorithm we’d like to port to RLlib that, with some (time-annealed) probability, consults an offline policy for advice - i.e. we’d sample an action from the usual ActionDistribution produced the the RL policy for much of the time, but occasionally we would feed the observation to the advice policy and sample an action from there.

Being relatively new to RLlib, it seems that subclassing Exploration would be the way to do this, but it’s not clear to me whether I can actually access the observations inside Exploration.get_exploration_action that I need in order to sample from my advice policy.

Any advice appreciated!

It looks like I could probably do this using ActionConnector – i.e. use the Algorithm’s standard exploration method to find an action, and then selectively override that action by sampling from my advice policy based upon the observation contained in the ActionConnectorDataType’s input_dict field.

Any reason that this would be a Bad Idea?