Fcnet hidden parameter

Hi, I have a question. recently, I have train model for using DDPG and TD3, what effect fcnet(fully connected network) parameter is?
I reckon that algorithm consist of actor and critic network. can someone teach me? thx

image

Yeah, DDPG, TD3, as well as SAC are Q-network based actor critic algos, so they contain both a Q-network (predicting action values) and a policy network (outputting actions to take). Both networks are updated simultaneously by two different optimizers and their losses are somewhat dependent on each other.
Bottom line: Yes, you will have to specify both networks’ (actor and critic) architectures and learning rates (lr) in the Trainer’s config.

thanks you, sven!
yeah, i know that TD3 are based on actor critic algors.
What i wanted to convey was why there are three network, e.g actor, critic, and fcnet hiddens.
what are roles of each, especially fcnet hiddens’s role?
is it observation network?

or actor/critic hidden → fcnet hidden; like figure 2


figure2

Ah, yeah, I understand your question now.
Figure 1 looks like it describes the situation best.
Yes, fcnet_hiddens would be the “Observation Network” in your figure. However, in your case (you probably specify a “custom_model” to build your Conv1D/Embedding network), “fcnet_hiddens” will be ignored (it’s only used for default(!) models).

Here is the thing, though. In your setup, how do you train the Observation Network? With which optimizer and which loss? If both optimizers (actor- and critic) alteratingly update the Observation Network, are you not worried about instabilities?

Thanks to you, I solved my curiosity. very thx, sven

Until now, I have applied it according to the manual below.
https://docs.ray.io/en/latest/rllib-algorithms.html#deep-deterministic-policy-gradients-ddpg-td3

And i have applied the default Adam optimizer and default loss function to each network(actor and critic). Additionally i’ve used framework provided by PyTorch.

is there a problem?

So your Observation Network exists twice? Once in the Q-net and once in the policy network?
If yes, then it should be fine.
If no (and it’s shared), you may have to be careful when updating the Observation Network’s weights by both critic- and actor-optimizers alternatingly.

Really grateful for your help.
And,Thank you for your concern.
Fortunately, I’m using it separately. Q net- and policy network;

Have a nice day!