I have a custom multi-agent environment with multiple groups of agents, where I would like each group to utilize the same policy. I configured this by setting AlgorithmConfig.multi_agent(policies={...}, policy_mapping_fn=...)
. Having the policy remember a history past states and actions would be very useful since the optimal action for a given state in my custom environment depends on prior states and actions. Hence naturally this would lead to using a recurrent layer (use_lstm
in the model config dictionary).
However, each agent within a group has a different RL environment state and action. Although I would like to have parameter-sharing between the policy used for each agent, I don’t want each agent to have the same LSTM hidden/cell states since each agent acts independently. How can I do this?
I am looking through the rllib source code but it is very convoluted
My goal is basically that the actor/policy network and the critic/value network should have separate LSTM hidden states for each agent, but still have parameter sharing during training.