Accessing the memory buffer dqn

Ofir_Abu · November 10, 2021, 11:21am

Hi,
When using a dqn agents (or any other relevant algorithms for that matter) - is there a way that I can manipulate the agent’s memory buffer during the training?

*By manipulation I intend to adding or removing transitions from/to the buffer.

Lars_Simon_Zehnder · November 10, 2021, 2:00pm

Hi @Ofir_Abu ,

this manipulation is certainly not trivial. As you can see in the source code, the ReplayBuffer holds in _storage a list of SampleBatches that contain the experiences from the environment. This means, you either need to add (by using add() or remove SampleBatches directly from this buffer.

Roller44 · November 10, 2021, 3:35pm

Hi, I am interested in learning how to customize policies/models by reading DQN’s code (because the official RLlib documentation is really hard to follow). However, I feel pretty confused when reading it.

Do you have any suggestions on where I should start to read?
Should I have a strong TensorFlow or PyTorch background?

Lars_Simon_Zehnder · November 10, 2021, 3:41pm

@Roller, could you start a new topic?

Roller44 · November 10, 2021, 3:50pm

Yes. Sorry. I have started a new topic. Here is the link, if you are interested in it.

Ofir_Abu · December 5, 2021, 7:53am

Hi, sorry for reopening the issue, I need a technical help.
Given a custom model inheriting from RecurrentTFModelV2 and a trainer achieved from trainer_template.build_trainer and A3CTFPolicy - how can I access the replay_buffer that the actual training is based on?

Thanks in advance!

Lars_Simon_Zehnder · December 5, 2021, 10:36am

@Ofir_Abu ,

you started with DQN and DQN works with a replay buffer to estimate the Q* function. For DQN you therefore find in the execution_plan() for DQN a local_replay_buffer that collects the data and replays it for network training. DQN is an offline algorithm.

In contrast A3C is an online algorithm that collects data in the environment and directly trains on it. It also estimates something different than DQN: instead Q* it goes for Q^\pi. The difference lays mathematically in the Bellman equations used. DQN uses the Bellman optimality equation whereas A3C uses the Bellman expectation equation. The latter needs the expectation evaluated in regard to the actual policy - this is turn would bias estimates if old samples (collected with old policies) would be used. Therefore A3C does not use replay.

Hope this could clarify it a little

arturn · December 5, 2021, 12:00pm

As a heads up - local replay buffer will probably be named MultiAgentReplayBuffer in the future.

Lars_Simon_Zehnder · December 5, 2021, 12:08pm

In replay to this: Each MultiAgentReplayBuffer uses a list of PrioritizedReplayBuffers (for each policy one). I guess this means, if standard replay (no prioritized sweeping) should be used we have to set the prioritized_replay_alpha attribute in the config to 0.0? The default is 0.6 in the PrioritizedReplayBuffer ctor.

arturn · December 5, 2021, 12:37pm

Exactly, no prioritization is achieved with prioritized_replay_alpha=0.

Lars_Simon_Zehnder · January 16, 2022, 1:37pm

I have to add:

All Replay buffers appear to use under the hood the PrioritizedReplayBufferand this requests an alpha>0 which makes no replay impossible via this way.

Topic		Replies	Views
'MultiAgentBatch' object has no attribute 'get' when using DQN and storing sequences in the Replay Buffer Debugging and performance tuning	0	115	May 11, 2024
Sample form the reply buffer sequentially in DQN RLlib	1	212	June 27, 2022
Multi Agent Prioritized Replay Buffer giving me trouble in DQN RLlib	1	77	July 10, 2025
Multi-agent Replay Buffer in DQN fails to run RLlib	0	14	July 17, 2025
Offline training using previous obs+action=reward tuples RLlib	1	298	May 24, 2021

Accessing the memory buffer dqn

Related topics