Anomalous behaviour with some plateaus during training

LeoLeoLeo · November 28, 2024, 9:57am

Hi everyone,

I’m currently training my agent on a custom environment using a Policy Gradient (PG) algorithm. However, I’m encountering some unexpected behavior when analyzing the training rewards:

Plateaus in Rewards: The mean reward values per episode show plateau-like behavior, where the reward remains exactly the same across multiple episodes (I verified this by checking the numerical values).
Missing Initial Episodes: The plotted rewards start from episode 750 instead of 0, which seems odd.

I suspect these issues might be related to the parallelization of environments across different workers, but I’m struggling to pinpoint the root cause or how to address it effectively.

Here is my config script:

config = {
    "env": CustomEnv,
    "env_config": env_config,
    "exploration_config": {
        "type": "StochasticSampling"
    },
    "model": {
        "custom_model": CustomNet,
    },
    "lr": 0.001,
    "gamma": 0.99,
    "num_workers": 12,
    "num_envs_per_worker": 4,
    "batch_size": 2048, 
    "entropy_coeff": 0.1,
    "num_gpus": 1
}

Has anyone faced similar issues or has any insights on how to resolve them?

Thanks in advance for your help!

Best regards,
L.E.O.

Topic		Replies	Views
Help with Reward Plateaus and Missing Initial Episodes in PG Algorithm Training RLlib	0	14	November 29, 2024
Constant episode_reward_mean over training, even setting horizon parameter RLlib	3	46	December 5, 2024
Unexpected dramatic drop in reward RLlib	8	957	November 13, 2023
Bad inference after perfect training. What am I missing? RLlib	3	749	June 8, 2022
Training mean reward vs. evaluation mean rewward RLlib	4	1336	November 17, 2022

Anomalous behaviour with some plateaus during training

Related topics