Hi! I’m working on a path planning task using RL. At each timestep, we would like to “freeze” the simulator, and do a simulation on the agent side to produce a sequence of waypoints.
The RL action output is the relative offset from current position, i.e. (dx, dy). We would like to do a “circular decision”, so that we get [(dx1, dy1), (dx2, dy2), …] (maxium at 10), each one is the offset from last position.
I’m wondering if this is possible? What part of the RLlib workflow should I modify, or do I simply use gym.spaces.Sequence
?