KeyError: 'advantages'

Hello,

I’m currently working on using PPO for my custom environment and Policy, and I’ve set things up, but I get this error:

ppo_torch_learner.py", line 89, in compute_loss_for_module
    batch[Postprocessing.ADVANTAGES] * logp_ratio,
  File "/home/amir/.local/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 973, in getitem
    value = dict.getitem(self, key)
KeyError: 'advantages'

I’ve seen the community posts already, and I saw the solutions, but non of them worked for me, like specifying policies_to_train.
here is my config:

config = (
    PPOConfig()
    .environment("traffic_env", env_config=env_config)
    .callbacks(CustomCallback)  
    .env_runners(
        num_env_runners=4,
        num_envs_per_env_runner=1,
        sample_timeout_s=72000,  
        rollout_fragment_length = 64
    )
    .multi_agent(
        policies={"p1"},
        policy_mapping_fn=lambda agent_id, episode, **kw: "p1",
        policies_to_train={"p1"}
    )
    .training(
        train_batch_size=2048, # 4 trajectories
        minibatch_size=128, # 16 batches
        num_epochs=10,
        entropy_coeff=0.01,
        lr=3e-4,
        use_gae=True,
        gamma=0.99,
        lambda_=0.95,
    )
    .rl_module(
        rl_module_spec=MultiRLModuleSpec(
            rl_module_specs={
                "p1": RLModuleSpec(
                    module_class=CustomPolicy,
                    model_config={"embedding_dim": 34, "env_config": env_config},
                )
            }
        )
    )
    .resources(num_gpus=1)
    .framework("torch")
)

I would appreciate any advice.

2 Likes

I have the same error. Did you find a fix?

I am encountering the same problem with a similar configuration (although not multi-agent in my case). Any update on this issue or suggestions from anyone?

I have solved this issue by adding the following;

.training(
        use_gae=True, 
        use_critic = True
)

.env_runners(
            add_default_connectors_to_env_to_module_pipeline=True,
            add_default_connectors_to_module_to_env_pipeline=True)
1 Like

I cannot tell why the error occurs but some insights, if you want to use GAE, you need the advantages, these are calculated by the GeneralAdvantageEstimation connector in the learner pipleline (a default connector).
It seems like this piece was not added or does not perform like expected. I assume this is due some old API stack uses of Policies.
Does it work when you switch back to the old API?