Implementation of CommNet or DIAL in RLLIB

Rohit_Modee · September 28, 2022, 9:23pm

How severe does this issue affect your experience of using Ray?

High: It blocks me from completing my task.

Hi,

So the AIM is to have a policy_network that will return action and message. The message will be used to communicate with other agents.

@sven1977 is this doable using RLlib

I am trying to implement communication in the MARL setting where the number of agents changes dynamically. The closest communication method in this setting that I came across is CommNet.

My policy network looks like this.

class PolicyNetwork(TorchModelV2, nn.Module):
    """Example of a PyTorch custom model that just delegates to a fc-net."""

    def __init__(self, obs_space, action_space, num_outputs, model_config,
                 name):
        TorchModelV2.__init__(self, obs_space, action_space, num_outputs,
                              model_config, name)
        nn.Module.__init__(self)

        self.mlp_f = nn.Sequential(some layers)
        self.mlp_ae = nn.Sequential(some layers)
        self.mlp_msg = nn.Sequential(some layers)
        self.mlp_interaction = nn.Sequential(some layers)
        self.values = nn.Sequential(some layers)
        self._last_value = None

    def forward(self, input_dict, state, seq_lens):
        features = input_dict["obs"][:,:128+5+3]
        message = input_dict["obs"][:,128+5+3:]
        # if features[-1,-1] > 0:
        #     pdb.set_trace()

        f_feature = self.mlp_f(features[:,-8:])  # h similar to CommNet h
        ae_feature = self.mlp_ae(features)     # h similar to CommNet h

        # c similar to CommNet c = communication
        communication = self.mlp_msg(message)
        next_h = ae_feature + f_feature + communication
        final_out = self.mlp_interaction(next_h)

        self._last_value = self.values(features)

        return final_out, [next_h]

    def value_function(self):
        return torch.squeeze(self._last_value, -1)

The above policy network throws KeyError: 'seq_lens'

I want the policy network to be similar to CommNet architecture. The policy network should be able to pass a message to the environment, where it will be summed and passed back to the policy network in the next step t+1.

Below is the flow chart

1. Reset Env ==> get obs+t = ```[state_t, msg_t]``` where time_step t=0 and msg_0 = [0...0]
2. obs_t ==> policy_net ==> ```[output/action, msg_t+1]``` ==> env ==> new_obs_t+1
3.  ```[state_t+1, msg_t+1]``` ==> policy_net ==> ```[output/action, msg_t+2]```

@CodingBurmer Have you implemented something like this.

arturn · November 30, 2022, 7:12pm

Hi @Rohit_Modee ,

Sorry for the delay, we’ve been quite busy for the last release. I’m not an expert on MA Communication RL things. Where is the error thrown? MAML does nothing specific with seq_lens.
What you are describing should be doable with RLlib.
If you can provide a reproduction script, we can debug.

Cheers

Topic		Replies	Views
RLlib Parameter Sharing / MARL Communication RLlib	7	1625	May 14, 2021
BicNet / CommNet / MARL communication RLlib	2	1049	March 19, 2021
Communication in MARL RLlib	3	613	March 24, 2021
NN model for RLLib A3CPolicy RLlib	1	402	July 23, 2021
MARL settings - how to broadcast messages? RLlib	0	187	September 19, 2021

Implementation of CommNet or DIAL in RLLIB

Related topics