Custom Critic (Value_function) in PPO

Hey folks,

am right in assuming that if I want to implement a custom actor and critic network for the PPO algorithm (like Actor-Critc Type #2 in the figure), I just have to implement the Actor-Network in the
forward() method and the Critc-Network in the value_function() method of my custom Model?

Also, could this architecture cause problems in a MARL scenario?

Thx for every kind of help :blush:

class CustomActorCritic(TorchModelV2):
  def __init__(self, obs_space, action_space, num_outputs, model_config, name): 
    policy_network = x
    value_network = y

  def forward(self, input_dict, state, seq_lens): 
    action = policy_network(obs)
    return action

  def value_function(self):
    q_value = value_network(obs)
    return q_value

ModelCatalog.register_custom_model("my_torch_model", CustomActorCritic)

ray.init()
trainer = ppo.PPOTrainer(env="CartPole-v0", config={
    "framework": "torch",
    "model": {
        "custom_model": "my_torch_model",
    },
})

Thanks @CodingBurmer! We appreciate your intention to contribute.
CC @sven1977
If you don’t get a response for a while please tag me and I will try to help.

1 Like

Hey @CodingBurmer , that looks all correct. In the MARL case, you would have to see, whether you would want a so called “centralized critic”, which takes in observations for all (or at least some) agents, instead of just the “own” one. You can check this example script where we override the postprocess_trajectory function to manipulate the train_batch to add all other agents’ observations.

ray/rllib/examples/centralized_critic.py

1 Like

Hey @sven1977 THX for your answer :slight_smile: I already have a working PPO with a “centralized critic”, but THX for your proposal. Right now I’m trying to add a bidirectional RNN communication layer to the PPO model.

1 Like