Hello! First time posting here. I am loving the Ray ecosystem, and am trying to figure out a kind of niche problem.
High level problem description:
I am training a multi agent lunar lander system with 4 trainable sub-agents.
Agent 1: Stabilize
Agent 2: Move to center
Agent 3: Land
Agent 4: Select which agent (1-3) to query an action from.
I understand that the policy map function exists, however I am unsure how to make that trainable.
Any thoughts?