I have been experiencing a similar issue with off policy algorithms like DDPG and SAC when using replay buffers with storage units set to episodes. I made a post about it here: Replay buffer with episodes as storage unit not training
1 Like