Hello Ray team,
I have implemented a centralized critic into my PPO multiagent. I’m able to get to the point where I can call the policy.compute_central_vf for the first episode in the batch. However, right after that, I get that weird error :
ValueError: Cannot feed value of shape (34, 120) for Tensor ‘arg_0:0’, which has shape ‘(85, 120)’
This is inside my centralized_critic_postprocessing :
sample_batch[SampleBatch.VF_PREDS] = policy.compute_central_vf(
sample_batch["OBS1"], sample_batch["OBS2"],
sample_batch["OBS3"], sample_batch["ACT2"].astype('float32'), sample_batch["ACT3"].astype('float32'))
I have 3 collaborative learning agents, during first episode the shapes are ( agent1: (85,120) obs_agent2: (85,150) obs_agent3: (85,150), act_agent2 (85, 4), act_agent3: (85, 4) ). the call to compute_central_vf gives no problem. This is the definition of the model of central_vf :
obs1 = tf.keras.layers.Input(shape=(120, ), name=“obs_LM”)
obs2 = tf.keras.layers.Input(shape=(150, ), name=“obs_radar1”)
obs3 = tf.keras.layers.Input(shape=(150, ), name=“obs_radar2”)
act1 = tf.keras.layers.Input(shape=(4, ), name="act_1")
act2 = tf.keras.layers.Input(shape=(4, ), name="act_2")
concat_obs = tf.keras.layers.Concatenate(axis=1)(
[obs1, obs2, obs3, act1, act2])
central_vf_dense = tf.keras.layers.Dense(
16, activation=tf.nn.tanh, name="c_vf_dense")(concat_obs)
central_vf_out = tf.keras.layers.Dense(
1, activation=None, name="c_vf_out")(central_vf_dense)
self.central_vf = tf.keras.Model(
inputs=[obs1, obs2, obs3, act1, act2], outputs=central_vf_out)
I get the mentionned error “ValueError: Cannot feed value of shape (34, 120) for Tensor ‘arg_0:0’, which has shape ‘(85, 120)’” at the next call of it because my batch size is now 34 for every entries.
My guess is that there might be some kind of optimization issue because there were batch of size 85 before and now 34 but I really don’t have any idea how to solve this issue.
Thanks in advance,
Clement