Hi, everybody
I want to check the replay buffer.
Can I print it?
I’m using TD3 algorithm.
Hi, everybody
I want to check the replay buffer.
Can I print it?
I’m using TD3 algorithm.
Hi @Xim_Lee,
If you just want to print it manually to examine then your best bet is to run with num_workers=0 and start ray with ray.init(local_mode=True). Then you can set a break point in the ReplayBuffer class.
If you wanted to do something automated in the code that will be more involved. This is where the replay buffer is created:
You are probably just better off subclassing it with your implementation that prints.
You will need to define a new execution plan and update the TD3 trainer. Perhaps putting the local_replay_buffer in the worker’s local worker and accessing it from the postprocess_trajectory callback.
Thank you @mannyv !
I was able to check the information on the replay buffer.
I want to do double-check the action using Q-value.
It means that I want to check the action computed through Actor-network is the argmax value of Critic network.
How to check this?
The code below is the code I changed.
How to know the current network weights in exploration PARTS?
def _get_torch_exploration_action(self, action_dist, explore, timestep):
# Set last timestep or (if not given) increase by one.
global action
self.last_timestep = timestep if timestep is not None else \
self.last_timestep + 1
# Apply exploration.
if explore:
# Random exploration phase.
if self.last_timestep <= self.random_timesteps:
action, _ = \
self.random_exploration.get_torch_exploration_action(
action_dist, explore=True)
# Take a Gaussian sample with our stddev (mean=0.0) and scale it.
# BMIL edit
else:
det_actions = action_dist.deterministic_sample()
# Using Hard Boundary
elif len(det_actions[0]) == 2:
det_actions = action_dist.deterministic_sample()
acc_det_actions = det_actions[0][0]
LC_det_actions = det_actions[0][1]
# Exploit action select - Hard Boundary
if LC_det_actions > 0.9:
LC_det_actions = torch.Tensor([1])
elif LC_det_actions <= 0.9 and LC_det_actions >= -0.9:
LC_det_actions = torch.Tensor([0])
elif LC_det_actions < -0.9:
LC_det_actions = torch.Tensor([-1])
print(self.agents.get_weights())
# Exploration: Epsilon Greedy for Lane Change action
epsilon = self.scale_schedule(self.last_timestep)
possible_actions = torch.Tensor([[-1], [0], [1]])
random_actions = random.choice(possible_actions)
prob = torch.empty(
torch.Tensor(1).size(), ).uniform_().to(self.device)
LC_det_actions = torch.where(
prob < epsilon,
random_actions, LC_det_actions)
# Exploration: Guassian Noise for Acceleration action
scale = self.scale_schedule(self.last_timestep)
gaussian_sample = scale * torch.normal(
mean=torch.zeros(acc_det_actions.size()), std=self.stddev).to(
self.device)
acc_det_actions = torch.Tensor([acc_det_actions + gaussian_sample])
# Concatenation A_acc + A_lc
det_actions = torch.cat([acc_det_actions, LC_det_actions]).reshape((1,2))
# Action Bound - Action Space
action = torch.min(
torch.max(
det_actions,
torch.tensor(
self.action_space.low,
dtype=torch.float32,
device=self.device)),
torch.tensor(
self.action_space.high,
dtype=torch.float32,
device=self.device))
else:
det_actions = action_dist.deterministic_sample()
if len(det_actions[0]) == 2:
acc_det_actions = det_actions[0][0]
LC_det_actions = det_actions[0][1]
# Exploit action select - Hard Boundary
if LC_det_actions > 0.333:
LC_det_actions = torch.Tensor([1])
elif LC_det_actions <= 0.333 and LC_det_actions >= -0.333:
LC_det_actions = torch.Tensor([0])
elif LC_det_actions < -0.333:
LC_det_actions = torch.Tensor([-1])
action = torch.cat([acc_det_actions, LC_det_actions]).reshape((1, 2))
logp = torch.zeros(
(action.size()[0], ), dtype=torch.float32, device=self.device)
return action, logp
The reason i implementation it on exploration parts is to put the action into replay-buffer.
If there are available other methods, I want to know that.
Hey @Xim_Lee ,
The exploration object that has the _get_torch_exploration_action
should also have a self.policy
property, which you can use to get to its model’s weights:
model = self.policy.model
weights = model.get_weights()
Hey @sven1977
Thank you for your reply.
But I got an error about gaussian noise exploration object that has not self.policy property.
File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py", line 159, in compute_actions
input_dict, state_batches, seq_lens, explore, timestep)
File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py", line 252, in _compute_action_helper
explore=explore)
File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/gaussian_noise.py", line 90, in get_exploration_action
explore, timestep)
File "/home/bmil/anaconda3/envs/flow/lib/python3.7/site-packages/ray/rllib/utils/exploration/gaussian_noise.py", line 218, in _get_torch_exploration_action
model = self.policy.model
AttributeError: 'GaussianNoise' object has no attribute 'policy'