Custom action space

Hello, I am in need to use a Multinomial Distribution as my action and observation space. This is not even included on Gym’s spaces.

One option that I have been working on, is to create a custom Gym space and map it to a Multinomial distribution. This involved doing surgery on RLLIB’s source code, but it has been working so far, however, it is very laborious and I only implemented it for TF.

I wonder if something similar (i.e. new action and observation space) can be achieved with this functionality called “custom action distributions”.

The reason that is not clear, is that in the official example the model uses a Categorical distribution, which is not a new gym space. While in my case, I have a totally new gym space and distribution to sample from, as the Multinomial is not currently on RLLIB nor in gym.

I will be very grateful for any pointer.


Hello !

Just a fellow user like you, but I had a similar use-case recently. I had to simulate if an agent has detected one of many objects in their current observations set. We ended up using spaces.MultiBinary for the observation space, but implemented the detection using numpy and the action_mask ( so essentially, space.sample(action_mask)).

I’d love to help you out if I can :slight_smile: Do you have an example in mind? Also, is this what you had in mind?

Hi, thank you very much for your answer!.
My problem is that after Ray > 2.0 everything has changed and now I don’t know how to pass a custom action space, a custom policy and a custom loss. The old methods seem to be deprecated and the documentation hasn’t catch up.

I have a code running in Pytorch and I wanted to convert it to RLLIB, as simple as that. I can share my GitHub if you would like, it has a running code in Pytorch. It’s a private repo so I’d need your github user. It can be over PM if you want.

Thanks a lot!

Hello, I’m a beginner too, but let’s do our best to solve this :slight_smile: I’ve been going heavily through their examples in the repository for my own code and they seem like good places to start, for example

  1. (Example) Deploying Autoregressive model + action dist
  2. (Docs) Deploy Custom Torch Policy + (Example) Custom Torch Policy
  3. (Example) Custom Loss Function

Do you have an example we can go through, or otherwise, PM me?

Thank you very much @PrasannaMaddila ! I believe there are things that used to work on Ray 2.0 and not afterwards. For example, everything that uses “PPPOTrainer” has been demised, and everything that uses “with_updates” has either different imports or different logic from the official documentation.

For example, everything that uses build_policy_class as a deprecation warning here.

I believe the “solution” now is just to subclass the TorchPolicy. I will PM you.

Thank you very much!