KeyError: 'advantages'

Amir_Tahmasbi · February 26, 2025, 3:16pm

Hello,

I’m currently working on using PPO for my custom environment and Policy, and I’ve set things up, but I get this error:

ppo_torch_learner.py", line 89, in compute_loss_for_module
    batch[Postprocessing.ADVANTAGES] * logp_ratio,
  File "/home/amir/.local/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 973, in getitem
    value = dict.getitem(self, key)
KeyError: 'advantages'

I’ve seen the community posts already, and I saw the solutions, but non of them worked for me, like specifying policies_to_train.
here is my config:

config = (
    PPOConfig()
    .environment("traffic_env", env_config=env_config)
    .callbacks(CustomCallback)  
    .env_runners(
        num_env_runners=4,
        num_envs_per_env_runner=1,
        sample_timeout_s=72000,  
        rollout_fragment_length = 64
    )
    .multi_agent(
        policies={"p1"},
        policy_mapping_fn=lambda agent_id, episode, **kw: "p1",
        policies_to_train={"p1"}
    )
    .training(
        train_batch_size=2048, # 4 trajectories
        minibatch_size=128, # 16 batches
        num_epochs=10,
        entropy_coeff=0.01,
        lr=3e-4,
        use_gae=True,
        gamma=0.99,
        lambda_=0.95,
    )
    .rl_module(
        rl_module_spec=MultiRLModuleSpec(
            rl_module_specs={
                "p1": RLModuleSpec(
                    module_class=CustomPolicy,
                    model_config={"embedding_dim": 34, "env_config": env_config},
                )
            }
        )
    )
    .resources(num_gpus=1)
    .framework("torch")
)

I would appreciate any advice.

mathias204 · April 10, 2025, 2:59pm

I have the same error. Did you find a fix?

mchomlin · April 12, 2025, 9:51pm

I am encountering the same problem with a similar configuration (although not multi-agent in my case). Any update on this issue or suggestions from anyone?

Daan_H · May 4, 2025, 11:32pm

I have solved this issue by adding the following;

.training(
        use_gae=True, 
        use_critic = True
)

.env_runners(
            add_default_connectors_to_env_to_module_pipeline=True,
            add_default_connectors_to_module_to_env_pipeline=True)

Daraan · June 7, 2025, 3:34pm

I cannot tell why the error occurs but some insights, if you want to use GAE, you need the advantages, these are calculated by the GeneralAdvantageEstimation connector in the learner pipleline (a default connector).
It seems like this piece was not added or does not perform like expected. I assume this is due some old API stack uses of Policies.
Does it work when you switch back to the old API?

Topic		Replies	Views
KeyError: 'advantages' on MARL Configure Algorithm, Training, Evaluation, Scaling	4	68	April 17, 2025
KeyError: 'advantages' when training PPO with custom model in RLlib RLlib	7	183	March 27, 2025
I cant get my custom network to work RLlib	7	115	April 11, 2025
How to calculate the advantage in the forward method of a custom model based on the PPO algorithm RLlib	1	40	March 24, 2025
PPO with Critic and no GAE RLlib	1	448	May 3, 2021

KeyError: 'advantages'

Related topics