Discrepancy in policy_mapping_fn Signature in AlgorithmConfig Documentation (New API Stack)

Hello Ray/RLlib community,

I’ve noticed a potential discrepancy in the RLlib documentation, specifically concerning the required signature for the policy mapping function (policy_mapping_fn) within AlgorithmConfig.multi_agent.

:round_pushpin: Documentation URL:

:pushpin: The Issue:

The current documentation states the required signature for policy_mapping_fn as:

policy_mapping_fn: (agent_id, episode, worker, **kwargs) -> PolicyID

However, when configuring multi-agent environments using the New API Stack (based on Algorithm and RLModule), including the worker argument in the function definition raises a TypeError (got an unexpected keyword argument 'worker').

:white_check_mark: Working Signature (New API Stack):

The function signature that successfully executes in the current New API Stack is:

def policy_map_fn(agent_id, episode, **kwargs):
    # Mapping logic
    return "my_policy_id"

:cross_mark: Documented/Old API Signature:

The signature including worker seems to be required only for the Older API Stack or specific low-level use cases:

def policy_map_fn(agent_id, episode, worker, **kwargs):
    # Mapping logic
    return "my_policy_id"

:light_bulb: Request/Suggestion:

To prevent confusion and runtime errors for users adopting the New API Stack, I suggest updating the AlgorithmConfig.multi_agent documentation to clearly differentiate between the required signatures:

  1. Indicate the New API Stack signature (omitting worker) as the primary or default usage.

  2. Explicitly mention that the signature including worker is associated with the Older API Stack.

Thank you for your consideration in clarifying this detail for the community!

You are correct: in the New API Stack, the policy_mapping_fn should use the signature (agent_id, episode, **kwargs) and not include the worker argument. Including worker leads to a TypeError, as you observed. The documentation currently does not clearly distinguish this, which can cause confusion for users migrating to or starting with the New API Stack. This has been confirmed by user reports and code examples in the Ray community and source code comments, which show the new stack omits worker, while the old stack includes it for backward compatibility and RolloutWorker-based workflows. See the discussion and examples in Ray GitHub issues and Ray Discourse.

Would you like more detail on how to structure your config for multi-agent environments in the New API Stack, or on how to avoid compatibility issues?

Sources:

Hint: Mention @RunLLM in the post for followups.