1. Severity of the issue: (select one)
High: Completely blocks me.
2. Environment:
- Ray version: 2.44.1
- Python version: 3.11
- OS: Ubuntu 22.04
- Cloud/Infrastructure:
- Other libs/tools (if relevant):
Hello.
I’m currently using RLLib for multi-agent environment. In the current setup, we have a trainable agent (Neural network) competing against an programmed agent (similar to behavior tree), so two RLModules are initialized. The project have utilized ConnectorV2 for processing data from and to the environment with module_to_env and env_to_module connector pipeline. We have many custom connector that does post/pre-processing based on IF the policy is the trainable one or the programmed agent. These connectors also changes the observation_space and action_space through the recompute_output_observation_space() method in ConnectorV2, Since programmed agent doesn’t need to flatten the observation(originally dict) for example. A little bit like this example in the available default connectors.
I’m trying to extend this with League Self-play per example from the RLLib, and is currently stuck with how to modify ConnectorV2. Basically, we want to have multiple programmed agents, and trainable agents in similar way as given league example, utilizing the Callbacks.
However, the pre/post processing done with ConnectorV2 right now depends on which policy(RLModule) it is working on, and not the agent. When it was 1 vs 1 (trainable agent vs programmed agent) this was simply done by setting agent_id == policy_id. But now we might at a given time, do a trainable agent vs trainable agent (self-play. Both require flattening the space, and for example NumpyToTensor), or trainable vs programmed agent. So I need to know the policy_mapping_fn at ConnectorV2 __call__(…) and maybe even recompute_output_observation_space() to stay consistent with how we’ve utilized ConnectorV2 as a policy-specific pre/post processing.
I was wondering if we’ve just misused ConnectorV2 and should refactor every Module-specific pre/post processing out of the connector pipeline (a pretty tedious task), or if there are ways to update Connectors, or give the initialized connectors the required context (which policies are activate atm, and to which agent_id) so adapt to different league
Thank you for reading, if something was unclear I’ll try my best to clarify!