Multiple action spaces

Rohit_Modee · October 3, 2022, 10:56am

How severe does this issue affect your experience of using Ray?

High: It blocks me from completing my task.

Hi,

AIM:- In the forward, I want ch_act to be discrete values and ch_msg to be continuous/float, but I am getting both as the float. I don’t want actions to be flattened, How do I achieve this?

My action space is as follows.

self.action_space = spaces.Dict({"ch_act": spaces.MultiDiscrete([6,6,6]), "ch_msg": spaces.Box(high=1, low=0, shape=(32,))})

My Policy is as follows.

class PolicyNetwork(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.ff = some_mlp()
        self._last_value = None
        self.view_requirements["prev_actions"] = ViewRequirement(data_col="actions", shift=-1, space=self.action_space)

    def forward(self, input_dict, state, seq_lens):
        features = input_dict["obs"]
        ch_act = input_dict["actions"]["ch_act"]
        ch_msg = input_dict["actions"]["ch_msg"]
        ...

rusu24edward · October 3, 2022, 3:38pm

I believe the pre/post processors will flatten the space. I use Abmarl’s RavelDiscreteWrapper when I want to ensure that something is treated discrete. This wrapper will convert the space to Discrete (in your case, Discrete(216) since there are 216 combinations in MultiDiscrete[6, 6, 6])), so that it will be treated as one-hot-encoding in the flattening.

Rohit_Modee · October 11, 2022, 5:03pm

Hi @sven1977,

Is there no way to use two or more action spaces? I am not able to get this working. Any help would be appreciated.

mannyv · October 11, 2022, 6:08pm

Hi @Rohit_Modee,

The issue I think you might be running into here is that the actions are not included as input to the policy network. It is the task of the policy network to take the observations as input and produce the actions. The actions should be the output of the forward method.

You could for example add a ViewRequirements to provide the actions from the previous time step as input.

Rohit_Modee · October 13, 2022, 5:41pm

Thnx for the reply @mannyv , I checked by adding viewrequirement as suggested, but it did not work.
It throws the following error.

IndexError: too many indices for tensor of dimension 2

class PolicyNetwork(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.ff = some_mlp()
        self._last_value = None
        self.view_requirements["prev_actions"] = ViewRequirement(data_col="actions", shift=-1, space=self.action_space)
        self.view_requirements["actions"] = ViewRequirement(data_col="actions", shift=0, space=self.action_space)


    def forward(self, input_dict, state, seq_lens):
        features = input_dict["obs"]
        ch_act = input_dict["actions"]["ch_act"]
        ch_msg = input_dict["actions"]["ch_msg"]
        ...

mannyv · October 14, 2022, 11:20am

Hi @Rohit_Modee,

Do you have a reproduction script you could share?

Topic		Replies	Views
How to flatten space when action masking? RLlib	7	1630	September 1, 2023
[rllib] Dict Action Space and Custom Model RLlib	5	2449	March 30, 2021
RLLIB support for MultiDiscrete spaces higher than 1D RLlib	0	83	May 20, 2024
There was an error changing the trajecy_tory_view_api into continuous action space RLlib	7	597	February 22, 2022
RLlib and gym.space RLlib	4	707	November 14, 2021

Multiple action spaces

Related topics