Understanding QMIX

hermmanhender · September 5, 2023, 6:08pm

Hello!

I’m trying to understand how QMIX works in terms of adjusting policies. As far as I understand, this algorithm allows for centralized learning (with the mixing neural network) and decentralized execution.

However, during the training, only one policy (default_policy) is saved in the established checkpoints, which I interpret as the one of the mixing neural network. Is this correct? If so, how do I get the independent policies of each agent? This would help me to be able to establish independent actions later during the evaluation of these.

On the other hand, I also understand that the observation for QMIX must be a tuple, where the observation of the agent is provided on the one hand and the complete state of the environment on the other (in the example given in the library the only difference between these two is that the agent’s observation includes the agent’s ID, that is, that it is a fully observable environment). Having said that, is it possible to access the policy only with the observation of the agent or is it necessary to provide the observation and the state in order to calculate an action?

I have other questions, but these are the main ones. I’m trying to train a model that has multiple agents (all homogeneous, as requested by the library), but that can then integrate the pre-trained individual agents into other environments. Is this possible?

Thank you very much,
Germán

Topic		Replies	Views
QMix Grouping Agents in ExternalEnv Configuration Configure Algorithm, Training, Evaluation, Scaling	0	469	March 9, 2023
How to Implement Decentralized Execution RLlib	3	673	February 18, 2025
RLlib - QMIX configurations RLlib	0	888	December 4, 2020
ExternalMultiAgentEnv and QMIX for remote inference over HTTP with multiple clients RLlib	6	1339	October 15, 2021
Workflow for Multi-Agent training RLlib	2	368	January 12, 2022

Understanding QMIX

Related topics