Hey folks,
am right in assuming that if I want to implement a custom actor and critic network for the PPO algorithm (like Actor-Critc Type #2 in the figure), I just have to implement the Actor-Network in the
forward() method and the Critc-Network in the value_function() method of my custom Model?
Also, could this architecture cause problems in a MARL scenario?
Thx for every kind of help
class CustomActorCritic(TorchModelV2):
def __init__(self, obs_space, action_space, num_outputs, model_config, name):
policy_network = x
value_network = y
def forward(self, input_dict, state, seq_lens):
action = policy_network(obs)
return action
def value_function(self):
q_value = value_network(obs)
return q_value
ModelCatalog.register_custom_model("my_torch_model", CustomActorCritic)
ray.init()
trainer = ppo.PPOTrainer(env="CartPole-v0", config={
"framework": "torch",
"model": {
"custom_model": "my_torch_model",
},
})