Custom action space

Username1 · July 20, 2023, 12:03pm

Hello, I am in need to use a Multinomial Distribution as my action and observation space. This is not even included on Gym’s spaces.

One option that I have been working on, is to create a custom Gym space and map it to a Multinomial distribution. This involved doing surgery on RLLIB’s source code, but it has been working so far, however, it is very laborious and I only implemented it for TF.

I wonder if something similar (i.e. new action and observation space) can be achieved with this functionality called “custom action distributions”.

The reason that is not clear, is that in the official example the model uses a Categorical distribution, which is not a new gym space. While in my case, I have a totally new gym space and distribution to sample from, as the Multinomial is not currently on RLLIB nor in gym.

I will be very grateful for any pointer.

Thanks!

PrasannaMaddila · July 27, 2023, 3:08pm

Hello !

Just a fellow user like you, but I had a similar use-case recently. I had to simulate if an agent has detected one of many objects in their current observations set. We ended up using spaces.MultiBinary for the observation space, but implemented the detection using numpy and the action_mask ( so essentially, space.sample(action_mask)).

I’d love to help you out if I can Do you have an example in mind? Also, is this what you had in mind?

Username1 · July 31, 2023, 10:01am

Hi, thank you very much for your answer!.
My problem is that after Ray > 2.0 everything has changed and now I don’t know how to pass a custom action space, a custom policy and a custom loss. The old methods seem to be deprecated and the documentation hasn’t catch up.

I have a code running in Pytorch and I wanted to convert it to RLLIB, as simple as that. I can share my GitHub if you would like, it has a running code in Pytorch. It’s a private repo so I’d need your github user. It can be over PM if you want.

Thanks a lot!

PrasannaMaddila · July 31, 2023, 1:44pm

Hello, I’m a beginner too, but let’s do our best to solve this I’ve been going heavily through their examples in the repository for my own code and they seem like good places to start, for example

Do you have an example we can go through, or otherwise, PM me?

Username1 · July 31, 2023, 2:13pm

Thank you very much @PrasannaMaddila ! I believe there are things that used to work on Ray 2.0 and not afterwards. For example, everything that uses “PPPOTrainer” has been demised, and everything that uses “with_updates” has either different imports or different logic from the official documentation.

For example, everything that uses build_policy_class as a deprecation warning here.

I believe the “solution” now is just to subclass the TorchPolicy. I will PM you.

Thank you very much!

Topic		Replies	Views
Observation dependent continuous action space ("Masking" continuous action space) RLlib	4	1088	February 9, 2022
Continuous action space and custom model RLlib	4	1516	July 17, 2021
RLlib and gym.space RLlib	4	698	November 14, 2021
Custom sample distribution for Box RLlib	1	85	March 5, 2024
How to choose the action dist for a custom model with a Tuple action space? RLlib	5	828	May 15, 2022

Custom action space

Related topics