KeyError: 'advantages'

Hello,

I’m currently working on using PPO for my custom environment and Policy, and I’ve set things up, but I get this error:

ppo_torch_learner.py", line 89, in compute_loss_for_module
    batch[Postprocessing.ADVANTAGES] * logp_ratio,
  File "/home/amir/.local/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 973, in getitem
    value = dict.getitem(self, key)
KeyError: 'advantages'

I’ve seen the community posts already, and I saw the solutions, but non of them worked for me, like specifying policies_to_train.
here is my config:

config = (
    PPOConfig()
    .environment("traffic_env", env_config=env_config)
    .callbacks(CustomCallback)  
    .env_runners(
        num_env_runners=4,
        num_envs_per_env_runner=1,
        sample_timeout_s=72000,  
        rollout_fragment_length = 64
    )
    .multi_agent(
        policies={"p1"},
        policy_mapping_fn=lambda agent_id, episode, **kw: "p1",
        policies_to_train={"p1"}
    )
    .training(
        train_batch_size=2048, # 4 trajectories
        minibatch_size=128, # 16 batches
        num_epochs=10,
        entropy_coeff=0.01,
        lr=3e-4,
        use_gae=True,
        gamma=0.99,
        lambda_=0.95,
    )
    .rl_module(
        rl_module_spec=MultiRLModuleSpec(
            rl_module_specs={
                "p1": RLModuleSpec(
                    module_class=CustomPolicy,
                    model_config={"embedding_dim": 34, "env_config": env_config},
                )
            }
        )
    )
    .resources(num_gpus=1)
    .framework("torch")
)

I would appreciate any advice.

1 Like