Multi-Agent Policy Switching

Brendan_A · November 30, 2023, 7:33pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

I’m curious… I’m currently diagramming/brainstorming how I would setup a multi-agent training process in which agents would be directed to switch policies mid-episode. I am wondering how feasible this is and how folks might recommend going about it. Here are some more details:

This will be “hierarchical” in nature - there will be a “Boss” that has 4 different objects to direct. Each object can be assigned to perform 1 of 3 different tasks at any given time.
I imagine there would be a unique policy for each of the 3 tasks, meaning that Object 1 might start out being tasked with “Task X” and then be directed to switch to “Task Y” mid-episode. Which means I’d need to be able to switch policy training from Policy X to Policy Y.
Not all the objects have to be working on the same task at the same time, but they can be.
Not sure it’s relevant or not, but I’ve used PPO for training the tasks individually with success. At this point, the challenge is simultaneously training a leader to determine resource allocation and the individual policies that the objects act with.
I have considered a single policy with different reward functions based on the assigned task, but that feels messy at best.

I have seen a few examples of setting up this sort of hierarchical learning but I’ve not seen anyone create the policy switching mid-episode. Here’s a link with such an example: https://github.com/DeUmbraTX/practical_rllib_tutorial/blob/main/your_rllib_environment.py Note that the high-level decision only gets made once at the start of the episode and not continuously throughout the episode.

Can anyone share their thoughts on this?

Thanks!

Topic		Replies	Views
Multiple hierarchical agents possible? RLlib	2	579	August 11, 2021
How to deploy a trained Ray RLlib PPO policy/model in multi-agent-case? RLlib	5	823	November 10, 2021
Multi agent Policy, selector agent RLlib	0	217	May 9, 2023
Different learning rates for different agents RLlib	0	140	October 1, 2023
Asymmetric play multiagent environment RLlib	2	465	January 6, 2022

Multi-Agent Policy Switching

Related topics