How do RAY calculate the number of Parameters(weights and bias)?

Hi, :slight_smile:
I have a question.
How do RAY calculate the number of Parameters?
I used PPO algorithm and input_dim = 16, hiddens = [256,256] and my action space = 4 dimensional.
I have attached a picture below, but the result is strange.

image

where is it comes number β€˜8’?
if I use the Compute_action function, I get the 4-dimensional action space.
Could anyone explain me?

state = env.reset()
action = agent.compute_action(state)

    out = F.relu(F.linear(torch.from_numpy(state).float(), torch.from_numpy(policy_wei[1][1]),
                          torch.from_numpy(policy_bias[1][1])))
    out = F.relu(F.linear(out, torch.from_numpy(policy_wei[2][1]),
                          torch.from_numpy(policy_bias[2][1])))
    out = F.tanh(F.linear(out, torch.from_numpy(policy_wei[0][1]),
                          torch.from_numpy(policy_bias[0][1])))

policy out: tensor([ 0.3966,  0.6118,  0.5565,  0.0270, -0.9395,  0.8203, -0.0793, -0.3315])
state: [0.09984301 0.09992453 0.09992143 0.09991941 0.29046342 0.87227278
 0.98566566 0.10100834 0.21494237 0.         0.5        0.
 0.5        0.         1.         0.25      ] 
action: [-0.85631555  1.          0.8432337  -0.7363308 ]

Why do i get different result of compute_action and Policy network about same state.
How do RAY’s compute_action work?

Have a look here for an overview: RLlib Models, Preprocessors, and Action Distributions β€” Ray v2.0.0.dev0