Fcnet hidden parameter

Xim_Lee · January 22, 2021, 6:22am

Hi, I have a question. recently, I have train model for using DDPG and TD3, what effect fcnet(fully connected network) parameter is?
I reckon that algorithm consist of actor and critic network. can someone teach me? thx

sven1977 · January 22, 2021, 12:24pm

Yeah, DDPG, TD3, as well as SAC are Q-network based actor critic algos, so they contain both a Q-network (predicting action values) and a policy network (outputting actions to take). Both networks are updated simultaneously by two different optimizers and their losses are somewhat dependent on each other.
Bottom line: Yes, you will have to specify both networks’ (actor and critic) architectures and learning rates (lr) in the Trainer’s config.

Xim_Lee · January 25, 2021, 12:49am

thanks you, sven!
yeah, i know that TD3 are based on actor critic algors.
What i wanted to convey was why there are three network, e.g actor, critic, and fcnet hiddens.
what are roles of each, especially fcnet hiddens’s role?
is it observation network?

Xim_Lee · January 25, 2021, 1:23am

or actor/critic hidden → fcnet hidden; like figure 2

figure2

sven1977 · January 25, 2021, 8:36am

Ah, yeah, I understand your question now.
Figure 1 looks like it describes the situation best.
Yes, fcnet_hiddens would be the “Observation Network” in your figure. However, in your case (you probably specify a “custom_model” to build your Conv1D/Embedding network), “fcnet_hiddens” will be ignored (it’s only used for default(!) models).

Here is the thing, though. In your setup, how do you train the Observation Network? With which optimizer and which loss? If both optimizers (actor- and critic) alteratingly update the Observation Network, are you not worried about instabilities?

Xim_Lee · January 25, 2021, 1:52pm

Thanks to you, I solved my curiosity. very thx, sven

Until now, I have applied it according to the manual below.
https://docs.ray.io/en/latest/rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3

And i have applied the default Adam optimizer and default loss function to each network(actor and critic). Additionally i’ve used framework provided by PyTorch.

is there a problem?

sven1977 · January 25, 2021, 6:47pm

So your Observation Network exists twice? Once in the Q-net and once in the policy network?
If yes, then it should be fine.
If no (and it’s shared), you may have to be careful when updating the Observation Network’s weights by both critic- and actor-optimizers alternatingly.

Xim_Lee · January 26, 2021, 4:10am

Really grateful for your help.
And,Thank you for your concern.
Fortunately, I’m using it separately. Q net- and policy network;

Have a nice day!

Topic		Replies	Views
What is the difference between "hidden" and "model:fcnet_hiddens" in DQN's config? RLlib	1	707	November 26, 2021
Fcnet_hiddens and lstm settings RLlib	5	2120	December 16, 2021
Does ReLU activation interfere with DQN? RLlib	4	680	April 29, 2022
How to configure the neural networks in A3C? RLlib	2	503	November 10, 2021
Parameter noise exploration and policy gradient / actor critic RLlib	0	290	September 27, 2022

Fcnet hidden parameter

Related topics