Multi agent Policy, selector agent

Hello! First time posting here. I am loving the Ray ecosystem, and am trying to figure out a kind of niche problem.

High level problem description:
I am training a multi agent lunar lander system with 4 trainable sub-agents.
Agent 1: Stabilize
Agent 2: Move to center
Agent 3: Land
Agent 4: Select which agent (1-3) to query an action from.

I understand that the policy map function exists, however I am unsure how to make that trainable.

Any thoughts?