Can i check the Replay buffer?

Xim_Lee · July 24, 2021, 7:12am

Hi, everybody

I want to check the replay buffer.
Can I print it?
I’m using TD3 algorithm.

mannyv · July 24, 2021, 11:46am

If you just want to print it manually to examine then your best bet is to run with num_workers=0 and start ray with ray.init(local_mode=True). Then you can set a break point in the ReplayBuffer class.

If you wanted to do something automated in the code that will be more involved. This is where the replay buffer is created:

github.com

ray-project/ray/blob/b500a651b7801813ca59f91261318f6f8dc0ef24/rllib/agents/dqn/dqn.py#L214

    
      
          """
          if config.get("prioritized_replay"):
              prio_args = {
                  "prioritized_replay_alpha": config["prioritized_replay_alpha"],
                  "prioritized_replay_beta": config["prioritized_replay_beta"],
                  "prioritized_replay_eps": config["prioritized_replay_eps"],
              }
          else:
              prio_args = {}
          
          
local_replay_buffer = LocalReplayBuffer(
              num_shards=1,
              learning_starts=config["learning_starts"],
              buffer_size=config["buffer_size"],
              replay_batch_size=config["train_batch_size"],
              replay_mode=config["multiagent"]["replay_mode"],
              replay_sequence_length=config.get("replay_sequence_length", 1),
              replay_burn_in=config.get("burn_in", 0),
              replay_zero_init_states=config.get("zero_init_states", True),
              **prio_args)

You are probably just better off subclassing it with your implementation that prints.

You will need to define a new execution plan and update the TD3 trainer. Perhaps putting the local_replay_buffer in the worker’s local worker and accessing it from the postprocess_trajectory callback.

Xim_Lee · July 25, 2021, 8:47am

Thank you @mannyv !
I was able to check the information on the replay buffer.

I want to do double-check the action using Q-value.
It means that I want to check the action computed through Actor-network is the argmax value of Critic network.

How to check this?
The code below is the code I changed.
How to know the current network weights in exploration PARTS?

 def _get_torch_exploration_action(self, action_dist, explore, timestep):
        # Set last timestep or (if not given) increase by one.
        global action
        self.last_timestep = timestep if timestep is not None else \
            self.last_timestep + 1
        # Apply exploration.
        if explore:
            # Random exploration phase.
            if self.last_timestep <= self.random_timesteps:
                action, _ = \
                    self.random_exploration.get_torch_exploration_action(
                        action_dist, explore=True)
            # Take a Gaussian sample with our stddev (mean=0.0) and scale it.
            # BMIL edit
            else:
                det_actions = action_dist.deterministic_sample()

                # Using Hard Boundary
                elif len(det_actions[0]) == 2:
                    det_actions = action_dist.deterministic_sample()
                    acc_det_actions = det_actions[0][0]
                    LC_det_actions = det_actions[0][1]

                    # Exploit action select - Hard Boundary
                    if LC_det_actions > 0.9:
                        LC_det_actions = torch.Tensor([1])
                    elif LC_det_actions <= 0.9 and LC_det_actions >= -0.9:
                        LC_det_actions = torch.Tensor([0])
                    elif LC_det_actions < -0.9:
                        LC_det_actions = torch.Tensor([-1])

                    print(self.agents.get_weights())

                    # Exploration: Epsilon Greedy for Lane Change action
                    epsilon = self.scale_schedule(self.last_timestep)
                    possible_actions = torch.Tensor([[-1], [0], [1]])
                    random_actions = random.choice(possible_actions)
                    prob = torch.empty(
                            torch.Tensor(1).size(), ).uniform_().to(self.device)
                    LC_det_actions = torch.where(
                        prob < epsilon,
                        random_actions, LC_det_actions)

                    # Exploration: Guassian Noise for Acceleration action
                    scale = self.scale_schedule(self.last_timestep)
                    gaussian_sample = scale * torch.normal(
                        mean=torch.zeros(acc_det_actions.size()), std=self.stddev).to(
                        self.device)
                    acc_det_actions = torch.Tensor([acc_det_actions + gaussian_sample])

                    # Concatenation A_acc + A_lc
                    det_actions = torch.cat([acc_det_actions, LC_det_actions]).reshape((1,2))

                    # Action Bound - Action Space
                    action = torch.min(
                        torch.max(
                            det_actions,
                            torch.tensor(
                                self.action_space.low,
                                dtype=torch.float32,
                                device=self.device)),
                        torch.tensor(
                            self.action_space.high,
                            dtype=torch.float32,
                            device=self.device))

        else:
            det_actions = action_dist.deterministic_sample()
            if len(det_actions[0]) == 2:
                acc_det_actions = det_actions[0][0]
                LC_det_actions = det_actions[0][1]

                # Exploit action select - Hard Boundary
                if LC_det_actions > 0.333:
                    LC_det_actions = torch.Tensor([1])
                elif LC_det_actions <= 0.333 and LC_det_actions >= -0.333:
                    LC_det_actions = torch.Tensor([0])
                elif LC_det_actions < -0.333:
                    LC_det_actions = torch.Tensor([-1])

                action = torch.cat([acc_det_actions, LC_det_actions]).reshape((1, 2))

        logp = torch.zeros(
            (action.size()[0], ), dtype=torch.float32, device=self.device)

        return action, logp

The reason i implementation it on exploration parts is to put the action into replay-buffer.
If there are available other methods, I want to know that.

sven1977 · July 26, 2021, 6:04pm

Hey @Xim_Lee ,
The exploration object that has the _get_torch_exploration_action should also have a self.policy property, which you can use to get to its model’s weights:

model = self.policy.model
weights = model.get_weights()

Xim_Lee · July 27, 2021, 12:58am

Hey @sven1977
Thank you for your reply.
But I got an error about gaussian noise exploration object that has not self.policy property.

  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py", line 159, in compute_actions
    input_dict, state_batches, seq_lens, explore, timestep)
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py", line 252, in _compute_action_helper
    explore=explore)
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/gaussian_noise.py", line 90, in get_exploration_action
    explore, timestep)
  File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/gaussian_noise.py", line 218, in _get_torch_exploration_action
    model = self.policy.model
AttributeError: 'GaussianNoise' object has no attribute 'policy'

Topic		Replies	Views
Load/save replay buffer RLlib	5	783	September 18, 2022
Compute_actions for Trajectory API RLlib	11	2421	February 10, 2022
Accessing DQN Memory Buffer from Ray object store memory for Restore RLlib	0	231	December 8, 2020
How does Rolloutworker work (how is experience added to the replaybuffer?) RLlib	1	236	March 24, 2023
Using Hindsight Experience Replay in SAC RLlib	8	723	March 30, 2022

Can i check the Replay buffer?

Related topics