Hello,
I’m currently working on using PPO for my custom environment and Policy, and I’ve set things up, but I get this error:
ppo_torch_learner.py", line 89, in compute_loss_for_module
batch[Postprocessing.ADVANTAGES] * logp_ratio,
File "/home/amir/.local/lib/python3.10/site-packages/ray/rllib/policy/sample_batch.py", line 973, in getitem
value = dict.getitem(self, key)
KeyError: 'advantages'
I’ve seen the community posts already, and I saw the solutions, but non of them worked for me, like specifying policies_to_train.
here is my config:
config = (
PPOConfig()
.environment("traffic_env", env_config=env_config)
.callbacks(CustomCallback)
.env_runners(
num_env_runners=4,
num_envs_per_env_runner=1,
sample_timeout_s=72000,
rollout_fragment_length = 64
)
.multi_agent(
policies={"p1"},
policy_mapping_fn=lambda agent_id, episode, **kw: "p1",
policies_to_train={"p1"}
)
.training(
train_batch_size=2048, # 4 trajectories
minibatch_size=128, # 16 batches
num_epochs=10,
entropy_coeff=0.01,
lr=3e-4,
use_gae=True,
gamma=0.99,
lambda_=0.95,
)
.rl_module(
rl_module_spec=MultiRLModuleSpec(
rl_module_specs={
"p1": RLModuleSpec(
module_class=CustomPolicy,
model_config={"embedding_dim": 34, "env_config": env_config},
)
}
)
)
.resources(num_gpus=1)
.framework("torch")
)
I would appreciate any advice.