[rllib] Customized action distribution of probability matrices

How severe does this issue affect your experience of using Ray?

  • Medium: It contributes to significant difficulty in completing my task, but I can work around it.

Hi everyone,

Is there a way to create a customized action distribution where I’m sampling a probability matrix from it?
My current solution is that my NN outputs mean and log_std for a gaussian distribution, and I’m passing the sampled element through softmax.

Each row in my probability matrix represents the probability of choosing path i from N paths.

Thank you for the help!

Hi all,

Is anyone have an idea to overcome my issue? is my solution is good enough?