Retrieve the NN from a DQNTrainer agent

carlorop · April 12, 2023, 11:38am

I can retrieve the weight-bias dictionary from the DQNTrainer trainer using the following command:

weights = _service.model.get_policy().get_weights()

In this case, I trained a model using {“fcnet_hiddens”:[100, 50],“fcnet_activation”:"tanh}. The rest of the model config is default from dqn.DEFAULT_CONFIG. When I run the previous command, I get several tensors, [‘default_policy/fc_1/kernel’, ‘default_policy/fc_1/bias’, ‘default_policy/fc_out/kernel’, ‘default_policy/fc_out/bias’, ‘default_policy/value_out/kernel’, ‘default_policy/value_out/bias’, ‘default_policy/hidden_0/kernel’, ‘default_policy/hidden_0/bias’, ‘default_policy/dense/kernel’, ‘default_policy/dense/bias’, ‘default_policy/dense_1/kernel’, ‘default_policy/dense_1/bias’, ‘default_policy/dense_2/kernel’, ‘default_policy/dense_2/bias’]. This is the input dimension of these tensors:

default_policy/fc_1/kernel: 51
default_policy/fc_1/bias: 100
default_policy/fc_out/kernel: 100
default_policy/fc_out/bias: 50
default_policy/value_out/kernel: 100
default_policy/value_out/bias: 1
default_policy/hidden_0/kernel: 50
default_policy/hidden_0/bias: 256
default_policy/dense/kernel: 256
default_policy/dense/bias: 3
default_policy/dense_1/kernel: 50
default_policy/dense_1/bias: 256
default_policy/dense_2/kernel: 256
default_policy/dense_2/bias: 1

The input and output dimension are 51 and 3 respectively, hence, I assume that the layers of the policy network are fc_1 → fc_out → hidden_0 → dense. Renaming them as fc1, fc2, fc3 and fc4. I have created this NN in pytorch

class MyModel(torch.nn.Module):
    def __init__(self, ):
        super(MyModel, self).__init__()
        self.fc1 = torch.nn.Linear(51, 100)
        self.fc2 = torch.nn.Linear(100, 50)
        self.fc3 = torch.nn.Linear(50, 256)
        self.fc4 = torch.nn.Linear(256, 3)

    def forward(self, x):
        x = torch.tanh(self.fc1(x))
        x = torch.tanh(self.fc2(x))
        x = torch.tanh(self.fc3(x))
        x = torch.argmax(self.fc4(x))
        return x

I have also set the weights based on the weight-bias dictionary. However, it is not returning the same outputs as the agent. How can I retrieve the neural network from the DQNTrainer agent?

Topic		Replies	Views
Accessing weights of neural network of a policy RLlib	1	343	October 22, 2022
Restoring nn after training in multi agent environment Checkpointing, Restoring	3	304	September 25, 2023
Customize DQN policy in two-trainer multiagent example RLlib	4	386	September 20, 2022
Setting config["dueling"]=False still runs Dueling DQN RLlib	2	347	August 19, 2021
How to get Curiosity Policy Weights from a Policy Client RLlib	10	713	September 14, 2021

Retrieve the NN from a DQNTrainer agent

Related topics