How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
- High: It blocks me to complete my task.
Hello, we’ve an exploration algorithm we’d like to port to RLlib that, with some (time-annealed) probability, consults an offline policy for advice - i.e. we’d sample an action from the usual ActionDistribution produced the the RL policy for much of the time, but occasionally we would feed the observation to the advice policy and sample an action from there.
Being relatively new to RLlib, it seems that subclassing Exploration would be the way to do this, but it’s not clear to me whether I can actually access the observations inside Exploration.get_exploration_action that I need in order to sample from my advice policy.
Any advice appreciated!