Custom Critic (Value_function) in PPO

CodingBurmer · March 10, 2021, 8:29pm

Hey folks,

am right in assuming that if I want to implement a custom actor and critic network for the PPO algorithm (like Actor-Critc Type #2 in the figure), I just have to implement the Actor-Network in the
forward() method and the Critc-Network in the value_function() method of my custom Model?

Also, could this architecture cause problems in a MARL scenario?

Thx for every kind of help

class CustomActorCritic(TorchModelV2):
  def __init__(self, obs_space, action_space, num_outputs, model_config, name): 
    policy_network = x
    value_network = y

  def forward(self, input_dict, state, seq_lens): 
    action = policy_network(obs)
    return action

  def value_function(self):
    q_value = value_network(obs)
    return q_value

ModelCatalog.register_custom_model("my_torch_model", CustomActorCritic)

ray.init()
trainer = ppo.PPOTrainer(env="CartPole-v0", config={
    "framework": "torch",
    "model": {
        "custom_model": "my_torch_model",
    },
})

Ameer_Haj_Ali · March 10, 2021, 9:26pm

Thanks @CodingBurmer! We appreciate your intention to contribute.
CC @sven1977
If you don’t get a response for a while please tag me and I will try to help.

sven1977 · March 11, 2021, 8:22am

Hey @CodingBurmer , that looks all correct. In the MARL case, you would have to see, whether you would want a so called “centralized critic”, which takes in observations for all (or at least some) agents, instead of just the “own” one. You can check this example script where we override the postprocess_trajectory function to manipulate the train_batch to add all other agents’ observations.

ray/rllib/examples/centralized_critic.py

CodingBurmer · March 11, 2021, 9:15am

Hey @sven1977 THX for your answer I already have a working PPO with a “centralized critic”, but THX for your proposal. Right now I’m trying to add a bidirectional RNN communication layer to the PPO model.

Topic		Replies	Views
How to use a custom critic and default actor in PPO? RLlib	1	25	November 18, 2024
Seperate networks for actor and critic in the ppo RLlib	2	790	April 14, 2022
Callback on_episode_end does not report correct actions Configure Algorithm, Training, Evaluation, Scaling	2	29	February 12, 2025
Custom PyTorch model implementation for PPO training RLlib	1	383	July 23, 2023
PPOConfig + custom_model = no PPO at all? Configure Algorithm, Training, Evaluation, Scaling	0	258	December 28, 2023

Custom Critic (Value_function) in PPO

Related topics