Can i check the Replay buffer?

Hi, everybody

I want to check the replay buffer.
Can I print it?
I’m using TD3 algorithm.

Hi @Xim_Lee,

If you just want to print it manually to examine then your best bet is to run with num_workers=0 and start ray with ray.init(local_mode=True). Then you can set a break point in the ReplayBuffer class.

If you wanted to do something automated in the code that will be more involved. This is where the replay buffer is created:

You are probably just better off subclassing it with your implementation that prints.

You will need to define a new execution plan and update the TD3 trainer. Perhaps putting the local_replay_buffer in the worker’s local worker and accessing it from the postprocess_trajectory callback.

1 Like

Thank you @mannyv !
I was able to check the information on the replay buffer.

I want to do double-check the action using Q-value.
It means that I want to check the action computed through Actor-network is the argmax value of Critic network.

How to check this?
The code below is the code I changed.
How to know the current network weights in exploration PARTS?

 def _get_torch_exploration_action(self, action_dist, explore, timestep):
        # Set last timestep or (if not given) increase by one.
        global action
        self.last_timestep = timestep if timestep is not None else \
            self.last_timestep + 1
        # Apply exploration.
        if explore:
            # Random exploration phase.
            if self.last_timestep <= self.random_timesteps:
                action, _ = \
                        action_dist, explore=True)
            # Take a Gaussian sample with our stddev (mean=0.0) and scale it.
            # BMIL edit
                det_actions = action_dist.deterministic_sample()

                # Using Hard Boundary
                elif len(det_actions[0]) == 2:
                    det_actions = action_dist.deterministic_sample()
                    acc_det_actions = det_actions[0][0]
                    LC_det_actions = det_actions[0][1]

                    # Exploit action select - Hard Boundary
                    if LC_det_actions > 0.9:
                        LC_det_actions = torch.Tensor([1])
                    elif LC_det_actions <= 0.9 and LC_det_actions >= -0.9:
                        LC_det_actions = torch.Tensor([0])
                    elif LC_det_actions < -0.9:
                        LC_det_actions = torch.Tensor([-1])


                    # Exploration: Epsilon Greedy for Lane Change action
                    epsilon = self.scale_schedule(self.last_timestep)
                    possible_actions = torch.Tensor([[-1], [0], [1]])
                    random_actions = random.choice(possible_actions)
                    prob = torch.empty(
                            torch.Tensor(1).size(), ).uniform_().to(self.device)
                    LC_det_actions = torch.where(
                        prob < epsilon,
                        random_actions, LC_det_actions)

                    # Exploration: Guassian Noise for Acceleration action
                    scale = self.scale_schedule(self.last_timestep)
                    gaussian_sample = scale * torch.normal(
                        mean=torch.zeros(acc_det_actions.size()), std=self.stddev).to(
                    acc_det_actions = torch.Tensor([acc_det_actions + gaussian_sample])

                    # Concatenation A_acc + A_lc
                    det_actions =[acc_det_actions, LC_det_actions]).reshape((1,2))

                    # Action Bound - Action Space
                    action = torch.min(

            det_actions = action_dist.deterministic_sample()
            if len(det_actions[0]) == 2:
                acc_det_actions = det_actions[0][0]
                LC_det_actions = det_actions[0][1]

                # Exploit action select - Hard Boundary
                if LC_det_actions > 0.333:
                    LC_det_actions = torch.Tensor([1])
                elif LC_det_actions <= 0.333 and LC_det_actions >= -0.333:
                    LC_det_actions = torch.Tensor([0])
                elif LC_det_actions < -0.333:
                    LC_det_actions = torch.Tensor([-1])

                action =[acc_det_actions, LC_det_actions]).reshape((1, 2))

        logp = torch.zeros(
            (action.size()[0], ), dtype=torch.float32, device=self.device)

        return action, logp           

The reason i implementation it on exploration parts is to put the action into replay-buffer.
If there are available other methods, I want to know that.

Hey @Xim_Lee ,
The exploration object that has the _get_torch_exploration_action should also have a self.policy property, which you can use to get to its model’s weights:

model = self.policy.model
weights = model.get_weights()

Hey @sven1977
Thank you for your reply.
But I got an error about gaussian noise exploration object that has not self.policy property.

  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/", line 159, in compute_actions
    input_dict, state_batches, seq_lens, explore, timestep)
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/", line 252, in _compute_action_helper
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/", line 90, in get_exploration_action
    explore, timestep)
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/", line 218, in _get_torch_exploration_action
    model = self.policy.model
AttributeError: 'GaussianNoise' object has no attribute 'policy'