I have been experiencing a similar issue with off policy algorithms like DDPG and SAC when using replay buffers with storage units set to episodes. I made a post about it here: Replay buffer with episodes as storage unit not training
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Num_agent_steps_trained: 0 | 2 | 266 | May 4, 2024 | |
| Is there a way to set num_env_steps_sampled? | 1 | 547 | June 23, 2023 | |
| MultiAgent training Issues | 1 | 578 | April 9, 2024 | |
| Unable to replicate original PPO performance | 0 | 211 | May 10, 2024 | |
| Algo.train() calls env.step() with empty action object | 1 | 246 | December 21, 2023 |