How severe does this issue affect your experience of using Ray?
- Medium: It contributes to significant difficulty to complete my task, but I can work around it.
I have a custom network that’s outputting an N x M matrix of probabilities. These are essentially stacked Categorical distributions. So, if I have 4 x 5 matrix, it’s 4 independent distributions over the same 5 options. The 5 options are something like, up/down/left/right/noop.
I want to sample from each distribution a variable number of times per step. Let’s say my 4 options are cars/trucks/busses/bikes. In my simulation I might have 2 cars, 1 truck, no busses, and 6 bikes. So my vehicles are an array like [2, 1, 0, 6], and I want to draw that many times from each respectively indexed distribution. So post-sample, I expect a matrix that might look like:
[[1, 0, 0, 1, 0], ← cars
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[3, 0, 2, 1, 0]] ← bikes
I’m struggling to figure out how to implement this sampling appropriately through Rllib. In particular, I can’t figure out how to stick this logic into a Custom Action Distribution.
I’ve got the torch logic to DO the sampling in hand, I think my question essentially:
- Is a custom action distribution the right place to do this sampling?
- if so, how should I pass the list of available vehicles to the action_dist each step?
- should I instead pass the network output, unsampled, to the env, and have this sampling occur in the env?
- should I be trying to recreate this sampling approach with the built-in action spaces?