Centralized critic (2) training, descentralized actions in evaluation

Hi, I’m using ray/centralized_critic_2.py at master · ray-project/ray · GitHub to build a centralized critic in Ray version 1.13.0 and during training everything is well.

After training I loaded the policies trained (using the same observation space that in the training process, because a equal size is needed to load the policies) and ran an experiment using compute_single_action() comand I gave only the agent obs and not all the agents obs and actions.

So a problem was raised here, because during training (and in the reload in the evaluation process) I used a obs_space for the policies of:

Dict({'own_obs': []; 'agent_1_obs': []; 'agent_2_obs': []; 'agent_n_obs': []; ...; 'agent_1_act': []; 'agent_2_act': []; 'agent_n_act': []; ...})

But now I had only the observation of a single agent to compute the singular action:

[] # onw_obs list

This generate a conflict in the compute of actions with the trained policies with a diferent size of obs_space. How can I get a descentralized action selection with a centralized critic trining?

PD.: The error message here:

ValueError: The two structures don't have the same nested structure.

First structure: type=list str=[1, 0, 0.0, 0.0, 29.833333333333336, 21.178019137753296, 0.95, 216.33333333333334, 54.622685376413514, 28.200000000000003, 29.15, 28.883333333333333]

.49656644, -0.683525  , -0.00454591,       -0.7535753 , -1.6400133 , -1.3673427 ,  0.06114807, -0.3570869 ],
Entire first structure:[., ., ., ., ., ., ., ., ., ., ., .]
Entire second structure:OrderedDict([('DualSetPoint_obs', .), ('NorthWindowBlind_action', .), ('NorthWindowBlind_obs', .), ('NorthWindow_action', .), ('NorthWindow_obs', .), ('SouthWindow_action', .), ('SouthWindow_obs', .)])

Could you give us more info about your environment? I would try passing in the obs as {'own_obs':[]}, but if that doesn’t work, I suspect the other centralized critic example: ray/centralized_critic.py at master · ray-project/ray · GitHub would work better for your use case, since the one you linked postprocesses the batches during training to add the other observations to the agent’s obs.

I am having the same issue and using Overcooked environment.

@Rohan138 {'own_obs':[]} didn’t work as the structure is different. I tried to maintain the structure by having the same dict, but giving None as values for other keys, it still didn’t work. Finally, I tried to give fake/repeated data (e.g., np.zeros(action_space) or same data as own_obs etc.) and then it worked.

Of course, we need to code in the model that we use only own_obs to calculate the actions.

It would be nice to have a better solution!