PPO Centralized critic

Clement_Collgon · February 10, 2021, 5:05pm

Hello Ray team,

I have implemented a centralized critic into my PPO multiagent. I’m able to get to the point where I can call the policy.compute_central_vf for the first episode in the batch. However, right after that, I get that weird error :

ValueError: Cannot feed value of shape (34, 120) for Tensor ‘arg_0:0’, which has shape ‘(85, 120)’

This is inside my centralized_critic_postprocessing :

        sample_batch[SampleBatch.VF_PREDS] = policy.compute_central_vf(
            sample_batch["OBS1"], sample_batch["OBS2"],
            sample_batch["OBS3"], sample_batch["ACT2"].astype('float32'), sample_batch["ACT3"].astype('float32'))

I have 3 collaborative learning agents, during first episode the shapes are ( agent1: (85,120) obs_agent2: (85,150) obs_agent3: (85,150), act_agent2 (85, 4), act_agent3: (85, 4) ). the call to compute_central_vf gives no problem. This is the definition of the model of central_vf :

obs1 = tf.keras.layers.Input(shape=(120, ), name=“obs_LM”)
obs2 = tf.keras.layers.Input(shape=(150, ), name=“obs_radar1”)
obs3 = tf.keras.layers.Input(shape=(150, ), name=“obs_radar2”)

    act1 = tf.keras.layers.Input(shape=(4, ), name="act_1")
    act2 = tf.keras.layers.Input(shape=(4, ), name="act_2")

    concat_obs = tf.keras.layers.Concatenate(axis=1)(
        [obs1, obs2, obs3, act1, act2])

    central_vf_dense = tf.keras.layers.Dense(
        16, activation=tf.nn.tanh, name="c_vf_dense")(concat_obs)
    central_vf_out = tf.keras.layers.Dense(
        1, activation=None, name="c_vf_out")(central_vf_dense)
    self.central_vf = tf.keras.Model(
        inputs=[obs1, obs2, obs3, act1, act2], outputs=central_vf_out)

I get the mentionned error “ValueError: Cannot feed value of shape (34, 120) for Tensor ‘arg_0:0’, which has shape ‘(85, 120)’” at the next call of it because my batch size is now 34 for every entries.

My guess is that there might be some kind of optimization issue because there were batch of size 85 before and now 34 but I really don’t have any idea how to solve this issue.

Thanks in advance,

Clement

Topic		Replies	Views
PPO centralized critic example with more than two agents RLlib	4	1851	October 19, 2021
[Rllib] Centralised critic PPO for multiagent env (pettingzoo waterworld) RLlib	6	1981	April 28, 2022
How does the rollout worker pass the trainbatch to the loss function? RLlib	0	249	September 2, 2021
'use_lstm' with centralized critic for PPO RLlib	0	363	April 3, 2022
Train centralized_critic PPO and PPO at the same time RLlib	8	137	February 19, 2025

PPO Centralized critic

Related topics