Multiple action spaces

How severe does this issue affect your experience of using Ray?

  • High: It blocks me from completing my task.

Hi,

AIM:- In the forward, I want ch_act to be discrete values and ch_msg to be continuous/float, but I am getting both as the float. I don’t want actions to be flattened, How do I achieve this?

My action space is as follows.

self.action_space = spaces.Dict({"ch_act": spaces.MultiDiscrete([6,6,6]), "ch_msg": spaces.Box(high=1, low=0, shape=(32,))})

My Policy is as follows.

class PolicyNetwork(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.ff = some_mlp()
        self._last_value = None
        self.view_requirements["prev_actions"] = ViewRequirement(data_col="actions", shift=-1, space=self.action_space)

    def forward(self, input_dict, state, seq_lens):
        features = input_dict["obs"]
        ch_act = input_dict["actions"]["ch_act"]
        ch_msg = input_dict["actions"]["ch_msg"]
        ...

I believe the pre/post processors will flatten the space. I use Abmarl’s RavelDiscreteWrapper when I want to ensure that something is treated discrete. This wrapper will convert the space to Discrete (in your case, Discrete(216) since there are 216 combinations in MultiDiscrete[6, 6, 6])), so that it will be treated as one-hot-encoding in the flattening.

Hi @sven1977,

Is there no way to use two or more action spaces? I am not able to get this working. Any help would be appreciated.

Hi @Rohit_Modee,

The issue I think you might be running into here is that the actions are not included as input to the policy network. It is the task of the policy network to take the observations as input and produce the actions. The actions should be the output of the forward method.

You could for example add a ViewRequirements to provide the actions from the previous time step as input.

Thnx for the reply @mannyv , I checked by adding viewrequirement as suggested, but it did not work.
It throws the following error.

IndexError: too many indices for tensor of dimension 2

class PolicyNetwork(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.ff = some_mlp()
        self._last_value = None
        self.view_requirements["prev_actions"] = ViewRequirement(data_col="actions", shift=-1, space=self.action_space)
        self.view_requirements["actions"] = ViewRequirement(data_col="actions", shift=0, space=self.action_space)


    def forward(self, input_dict, state, seq_lens):
        features = input_dict["obs"]
        ch_act = input_dict["actions"]["ch_act"]
        ch_msg = input_dict["actions"]["ch_msg"]
        ...

Hi @Rohit_Modee,

Do you have a reproduction script you could share?