Can't understand training config

I am trying to use Ray Rllib to experiment a two-agent system; I defined an environment inheriting from MultiAgentEnv following this example since I’m also using agents with different observation and action spaces. The two agents will receive the same reward to prompt them to cooperate (both combined will generate locomotion in a bipedal model). I read all the examples but I can’t figure out what I should do to the configure the training. I want to use a unique PPO algorithm to train each agent (identified in the env by a unique name), each PPO will be defined by an MLP with input and output layer size inferred from the environment (but if not possible I could explicitly define them), the hidden layer size will be the same for both. I can’t understand what are all the steps I need to follow to define to define the two policies and assign them to the agents, and then how to start the training, the documentation and examples are a bit confusing and not clear to me.
I thank you in advance

Hi,@Donte

I think you have to try following these steps:

  1. Define the Environment: Ensure your MultiAgentEnv environment returns observations and actions for each agent separately, and provide a reward function that combines their contributions.

  2. Configure Policies: In the RLlib configuration, specify multiple policies for different agents. Use the POLICIES dictionary to define each agent’s policy with its corresponding observation and action spaces. Example:

    config["multiagent"] = {
        "policies": {
            "agent_1": (PPO, obs_space_1, act_space_1, {"hidden_layer_size": 64}),
            "agent_2": (PPO, obs_space_2, act_space_2, {"hidden_layer_size": 64}),
        },
        "policy_mapping_fn": lambda agent_id: agent_id
    }
    
  3. Training: Use RLlib’s tune.run() to start training with the defined configuration. Pass your environment and configuration to tune.run(). Example:

 from ray import tune
 from ray.rllib.agents import ppo

 tune.run(
     ppo.PPOTrainer,
     config={
         "env": YourMultiAgentEnv,
         "multiagent": {
             "policies": {
                 "agent_1": (ppo.PPO, obs_space_1, act_space_1, {"hidden_layer_size": 64}),
                 "agent_2": (ppo.PPO, obs_space_2, act_space_2, {"hidden_layer_size": 64}),
             },
             "policy_mapping_fn": lambda agent_id: agent_id
         },
         # Other PPO configurations
     }
 )

This setup ensures each agent has its policy, and training will occur concurrently. Adjust the configuration as needed based on your environment and training requirements.

Thanks

This looks like an outdated answer, the rllib library doesn’t have any agents module, I tried with algorithms.ppo but it doesn’t have any PPOTrainer method.