Use LSTM model for policy gradient multi-agent with different recurrent hidden states per agent

Is there any update on this? I’m facing a similar issue too.