KeyError: 'advantages' in PPO MARL

RunLLM · October 21, 2025, 8:09am

The KeyError: ‘advantages’ in Ray RLlib PPO with a custom RLModule usually means that the ‘advantages’ field is missing from the training batch, which is required for PPO loss computation. This is almost always because the value function predictions (VF_PREDS) are not being returned by your RLModule’s forward methods, so RLlib cannot compute advantages during postprocessing. In your MaskedRLModule, your _forward method only returns Columns.ACTION_DIST_INPUTS, but not Columns.VF_PREDS, which is required for GAE and PPO to work (see this discussion, Ray RLlib docs).

To fix this, update your _forward (and _forward_train, if you override it) to also return Columns.VF_PREDS, e.g.:

def _forward(self, batch: TensorType, **kwargs) -> TensorType:
    logits = self.policy_net(batch["obs"]["obs"].float())
    values = self.value_net(batch["obs"]["obs"].float()).squeeze(-1)
    if "action_mask" in batch["obs"]:
        mask = batch["obs"]["action_mask"]
        logits = logits.masked_fill(mask == 0, -1e9)
    return {
        Columns.ACTION_DIST_INPUTS: logits,
        Columns.VF_PREDS: values,
    }

This will allow RLlib to compute advantages and avoid the KeyError (see example fix, Ray RLlib postprocessing).

Would you like a step-by-step explanation of why this is required and how RLlib computes advantages?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
KeyError: 'advantages' on MARL Configure Algorithm, Training, Evaluation, Scaling	4	113	April 17, 2025
KeyError: 'advantages' when training PPO with custom model in RLlib RLlib	10	278	November 7, 2025
KeyError: 'advantages' Configure Algorithm, Training, Evaluation, Scaling	4	180	June 7, 2025
Trouble Migrating Multi-Agent PPO with Custom Model(Action Masking + CNN + MLP) to New RLlib API RLlib	7	95	July 30, 2025
I cant get my custom network to work RLlib	7	156	April 11, 2025

KeyError: 'advantages' in PPO MARL

Related topics