RLLib ConnectorV2 for Multiagent (league selfplay)

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

  • Ray version: 2.44.1
  • Python version: 3.11
  • OS: Ubuntu 22.04
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

Hello.

I’m currently using RLLib for multi-agent environment. In the current setup, we have a trainable agent (Neural network) competing against an programmed agent (similar to behavior tree), so two RLModules are initialized. The project have utilized ConnectorV2 for processing data from and to the environment with module_to_env and env_to_module connector pipeline. We have many custom connector that does post/pre-processing based on IF the policy is the trainable one or the programmed agent. These connectors also changes the observation_space and action_space through the recompute_output_observation_space() method in ConnectorV2, Since programmed agent doesn’t need to flatten the observation(originally dict) for example. A little bit like this example in the available default connectors.

I’m trying to extend this with League Self-play per example from the RLLib, and is currently stuck with how to modify ConnectorV2. Basically, we want to have multiple programmed agents, and trainable agents in similar way as given league example, utilizing the Callbacks.

However, the pre/post processing done with ConnectorV2 right now depends on which policy(RLModule) it is working on, and not the agent. When it was 1 vs 1 (trainable agent vs programmed agent) this was simply done by setting agent_id == policy_id. But now we might at a given time, do a trainable agent vs trainable agent (self-play. Both require flattening the space, and for example NumpyToTensor), or trainable vs programmed agent. So I need to know the policy_mapping_fn at ConnectorV2 __call__(…) and maybe even recompute_output_observation_space() to stay consistent with how we’ve utilized ConnectorV2 as a policy-specific pre/post processing.

I was wondering if we’ve just misused ConnectorV2 and should refactor every Module-specific pre/post processing out of the connector pipeline (a pretty tedious task), or if there are ways to update Connectors, or give the initialized connectors the required context (which policies are activate atm, and to which agent_id) so adapt to different league

Thank you for reading, if something was unclear I’ll try my best to clarify!

It seems likely that ConnectorV2 is designed to operate at the agent (not policy) level, and by default, does not have direct access to the policy_mapping_fn or policy_id within its __call__ or recompute_output_observation_space() methods. In multi-agent league setups, where agent_id and policy_id may not be 1:1, this makes policy-specific preprocessing in connectors challenging. The default RLlib pipeline uses the AgentToModuleMapping connector to map agent data to module (policy) data, but this mapping is not directly exposed to custom connectors in a way that would let you branch logic by policy_id inside ConnectorV2.

My understanding is that, as of Ray 2.44.1, there is no built-in mechanism to pass the policy_mapping_fn or resolved policy_id into ConnectorV2 methods. You may need to refactor policy-specific logic out of the connector pipeline, or, as a workaround, use the shared_data argument in ConnectorV2 to pass mapping information (e.g., agent_id → policy_id) from the environment or callback into the connector, so that your connector can branch on policy_id. This approach is not explicitly documented as best practice, but is suggested in AgentToModuleMapping and related discussions. There is no official example of policy-specific connector logic in the league self-play examples, which typically keep connectors agent-centric and handle policy logic elsewhere.

Would you like more detail on how to implement the shared_data workaround or on how to refactor your pipeline for league self-play?

Sources:

Hint: Mention @RunLLM in the post for followups.