Can't understand training config

Donte · July 30, 2024, 10:25am

I am trying to use Ray Rllib to experiment a two-agent system; I defined an environment inheriting from MultiAgentEnv following this example since I’m also using agents with different observation and action spaces. The two agents will receive the same reward to prompt them to cooperate (both combined will generate locomotion in a bipedal model). I read all the examples but I can’t figure out what I should do to the configure the training. I want to use a unique PPO algorithm to train each agent (identified in the env by a unique name), each PPO will be defined by an MLP with input and output layer size inferred from the environment (but if not possible I could explicitly define them), the hidden layer size will be the same for both. I can’t understand what are all the steps I need to follow to define to define the two policies and assign them to the agents, and then how to start the training, the documentation and examples are a bit confusing and not clear to me.
I thank you in advance

ussesjenny · July 30, 2024, 12:38pm

Hi,@Donte

I think you have to try following these steps:

Define the Environment: Ensure your MultiAgentEnv environment returns observations and actions for each agent separately, and provide a reward function that combines their contributions.

Configure Policies: In the RLlib configuration, specify multiple policies for different agents. Use the POLICIES dictionary to define each agent’s policy with its corresponding observation and action spaces. Example:

config["multiagent"] = {
    "policies": {
        "agent_1": (PPO, obs_space_1, act_space_1, {"hidden_layer_size": 64}),
        "agent_2": (PPO, obs_space_2, act_space_2, {"hidden_layer_size": 64}),
    },
    "policy_mapping_fn": lambda agent_id: agent_id
}

Training: Use RLlib’s tune.run() to start training with the defined configuration. Pass your environment and configuration to tune.run(). Example:

 from ray import tune
 from ray.rllib.agents import ppo

 tune.run(
     ppo.PPOTrainer,
     config={
         "env": YourMultiAgentEnv,
         "multiagent": {
             "policies": {
                 "agent_1": (ppo.PPO, obs_space_1, act_space_1, {"hidden_layer_size": 64}),
                 "agent_2": (ppo.PPO, obs_space_2, act_space_2, {"hidden_layer_size": 64}),
             },
             "policy_mapping_fn": lambda agent_id: agent_id
         },
         # Other PPO configurations
     }
 )

This setup ensures each agent has its policy, and training will occur concurrently. Adjust the configuration as needed based on your environment and training requirements.

Thanks

Donte · July 30, 2024, 1:59pm

This looks like an outdated answer, the rllib library doesn’t have any agents module, I tried with algorithms.ppo but it doesn’t have any PPOTrainer method.

Topic		Replies	Views
Handling Configurable Multi-Agent vs. Single-Agent Environments Configure Algorithm, Training, Evaluation, Scaling	1	35	May 19, 2025
Two different method mapping policy to agents RLlib	1	289	February 2, 2023
Asymmetric play multiagent environment RLlib	2	467	January 6, 2022
Multi-Agent Training with Different Algorithms RLlib	24	3512	October 11, 2022
Train for multi-agents with multi-machines and multi-GPUs Configure Algorithm, Training, Evaluation, Scaling	0	190	November 9, 2023

Can't understand training config

Related topics