First of all many thanks to this wonderful community that has already helped me a lot!
I’m solving an optimisation problem with Rllib (so far, I’m very optimistic). The particular details are quite complex, but can be broken down to ‘shooting’ targets in 2d space:
The observation space is NxN grid with ‘targets’. When ‘shot’, each target disappears, this logic is implemented on the side on the custom environment.
After some success with solving this with discrete action space and PPO (using each grid square as a separate action, NxN in total), I’ve hit the wall with growing of the observation space and, in turn, growing of the action space.
The obvious idea here was t0 try the continuous action space, moreover, the problem looks continuous ‘by nature’. Therefore I implemented the continuous space like
gym.spaces.Box(low=0, high=1, shape=(2,)
for the ‘x’ and ‘y’ dimensions of the grid.
For some reason, it looks like the Rllib documentation lacks some examples of continuous action spaces used with custom models.
From my understanding of the theoretical part, I need my model to output 2 means and 2 std’s for two-dimensional Gaussian distribution. Can you, please point me to some examples of how Rllib is expecting this output to be formatted? As I understand it, the action distribution will be based on the action space, however, I can’t find the module in the Rllib code responsible for sampling the outputs of my model based on the action space of my environment.
Basically, what I’m asking is if my model forward pass outputs something like
def forward(self, input_dict, state, seq_lens): ... return outputs, state
How do I decide on the
outputs.shape given the action space I have above?
Will be grateful for any hints!