I can retrieve the weight-bias dictionary from the DQNTrainer trainer
using the following command:
weights = _service.model.get_policy().get_weights()
In this case, I trained a model using {“fcnet_hiddens”:[100, 50],“fcnet_activation”:"tanh}. The rest of the model config is default from dqn.DEFAULT_CONFIG. When I run the previous command, I get several tensors, [‘default_policy/fc_1/kernel’, ‘default_policy/fc_1/bias’, ‘default_policy/fc_out/kernel’, ‘default_policy/fc_out/bias’, ‘default_policy/value_out/kernel’, ‘default_policy/value_out/bias’, ‘default_policy/hidden_0/kernel’, ‘default_policy/hidden_0/bias’, ‘default_policy/dense/kernel’, ‘default_policy/dense/bias’, ‘default_policy/dense_1/kernel’, ‘default_policy/dense_1/bias’, ‘default_policy/dense_2/kernel’, ‘default_policy/dense_2/bias’]. This is the input dimension of these tensors:
default_policy/fc_1/kernel: 51
default_policy/fc_1/bias: 100
default_policy/fc_out/kernel: 100
default_policy/fc_out/bias: 50
default_policy/value_out/kernel: 100
default_policy/value_out/bias: 1
default_policy/hidden_0/kernel: 50
default_policy/hidden_0/bias: 256
default_policy/dense/kernel: 256
default_policy/dense/bias: 3
default_policy/dense_1/kernel: 50
default_policy/dense_1/bias: 256
default_policy/dense_2/kernel: 256
default_policy/dense_2/bias: 1
The input and output dimension are 51 and 3 respectively, hence, I assume that the layers of the policy network are fc_1 → fc_out → hidden_0 → dense. Renaming them as fc1, fc2, fc3 and fc4. I have created this NN in pytorch
class MyModel(torch.nn.Module):
def __init__(self, ):
super(MyModel, self).__init__()
self.fc1 = torch.nn.Linear(51, 100)
self.fc2 = torch.nn.Linear(100, 50)
self.fc3 = torch.nn.Linear(50, 256)
self.fc4 = torch.nn.Linear(256, 3)
def forward(self, x):
x = torch.tanh(self.fc1(x))
x = torch.tanh(self.fc2(x))
x = torch.tanh(self.fc3(x))
x = torch.argmax(self.fc4(x))
return x
I have also set the weights based on the weight-bias dictionary. However, it is not returning the same outputs as the agent. How can I retrieve the neural network from the DQNTrainer agent?