Hi, I’m using ray/centralized_critic_2.py at master · ray-project/ray · GitHub to build a centralized critic in Ray version 1.13.0 and during training everything is well.
After training I loaded the policies trained (using the same observation space that in the training process, because a equal size is needed to load the policies) and ran an experiment using compute_single_action()
comand I gave only the agent obs and not all the agents obs and actions.
So a problem was raised here, because during training (and in the reload in the evaluation process) I used a obs_space for the policies of:
Dict({'own_obs': []; 'agent_1_obs': []; 'agent_2_obs': []; 'agent_n_obs': []; ...; 'agent_1_act': []; 'agent_2_act': []; 'agent_n_act': []; ...})
But now I had only the observation of a single agent to compute the singular action:
[] # onw_obs list
This generate a conflict in the compute of actions with the trained policies with a diferent size of obs_space. How can I get a descentralized action selection with a centralized critic trining?
Thanks!
PD.: The error message here:
ValueError: The two structures don't have the same nested structure.
First structure: type=list str=[1, 0, 0.0, 0.0, 29.833333333333336, 21.178019137753296, 0.95, 216.33333333333334, 54.622685376413514, 28.200000000000003, 29.15, 28.883333333333333]
.49656644, -0.683525 , -0.00454591, -0.7535753 , -1.6400133 , -1.3673427 , 0.06114807, -0.3570869 ],
dtype=float32))])
Entire first structure:[., ., ., ., ., ., ., ., ., ., ., .]
Entire second structure:OrderedDict([('DualSetPoint_obs', .), ('NorthWindowBlind_action', .), ('NorthWindowBlind_obs', .), ('NorthWindow_action', .), ('NorthWindow_obs', .), ('SouthWindow_action', .), ('SouthWindow_obs', .)])