Hi, I’m relatively new to Rllib and just start exploring. I was trying to implement a Dirichlet distribution for a resource allocation system. since I didn’t find any real example in this topic I wanna ask if this distribution is available in PPO or not? I had a try but I got this error:
module 'gym.spaces' has no attribute 'Simplex'
and its natural because gym does not have simplex . so how can I implement Dirichlet?
Q2. what do the dimensions of simplex mean? for example , Simplex(shape=(3, 4))
suppose I have 2 batteries responsible for charging 4 wheels and they are all connected. what is the dimensions of simplex that I should use?
This seems more like a general RL question than RLlib specific.
I’d suggest you ask in some other RL forums.
Do you have any example of a working Simplex gym space?
As for the Dirichlet, you can always add more action distributions to RLlib following this API:
you can find Dirichlet dist in torch distribution Here in line 537. so Rllib support Dirichlet distribution in torch frameworks(I use torch often). you can find an implement in this article.
my problem was in defining action space. since I use gym envs, I defined my action space like the example of Rllib in utils code here. I had imported that code of course.
So I was confused how to define my action space for Dirichlet dist (I got above error for simplex).
now I’m considering to discretized action space and then use multi distribution action mask for these problems.
if you know a good forum/community please introduce. that would be much appreciated
you are right, I thought I wrote my code as you mentioned. that’s why I’m confused but it’s possible that I’m wrong( I was working on 4 envs simultaneously ). I will test again and I’ll share the result if you are interested .
my initial thought is it could yield a better result ( specially if you don’t want to discretize) . I’m going to test :
multi discrete action masked VS Dirichlet distribution
there are pros and cons albeit.
I try to do this in two weeks
thanks a lot
well, the model is implemented. also several points that I want to mention:
first I got negative entropy in the result and I got confused so I did some researches and it cleared for me that its ok sometimes for continuous distributions to have negative entropy. the pattern of entropy was exact what I had expected.
don’t use torch it is not implemented yet and use tf2.
using this distributions can be beneficial, for reference look here Dirichlet distribution is a generalized beta