AssertionError: Discrete(33) | MASAC with continuous and discrete agents

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

  • Ray version: 2.48.0
  • Python version:
  • OS:
  • Cloud/Infrastructure:
  • Other libs/tools (if relevant):

3. What happened vs. what you expected:

  • Expected:
  • Actual:

I am working on a problem with 4 agents. I want to use MASAC for this purpose. Agent1 has discrete action space. Agent2 to Agent4 have continuous action spaces. I am facing the “AssertionError: Discrete(33)“ error.

I am wondering if I can have agents with separate discrete and continuous actions under MASAC? If not, what will be the ideal workaround?

RLlib’s MASAC (multi-agent SAC) now supports both discrete and continuous action spaces, including in mixed multi-agent environments, as of recent updates. You can assign different action spaces to each agent by specifying them in the policies dict of the multi-agent config, and RLlib will infer and handle the spaces accordingly. Example scripts such as multi_agent_pendulum_sac.py and different_spaces_for_agents.py demonstrate this setup, where agents can have different action and observation spaces, including mixing discrete and continuous actions.

If you encounter an AssertionError: Discrete(33), ensure your environment and policy mapping are correctly set up, and that each policy’s action space matches the agent’s. RLlib should handle this automatically if you provide the correct spaces per agent in the config. If issues persist, double-check that your Ray version is up-to-date and that you are using the new API stack, as older versions or misconfigurations may not support mixed action spaces in MASAC properly. Would you like a step-by-step example configuration?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM I can’t find any example where some agents have discrete while others have continuous action space. Please share the relevant example if any.

You can find an official RLlib example of a multi-agent environment where some agents have discrete and others have continuous action spaces in different_spaces_for_agents.py. In this script, agent0 uses a discrete action space, while agent1 uses a different (can be continuous) action space, demonstrating how to configure and run such mixed-space setups.

Would you like a code snippet or further explanation of how to adapt this example to your use case?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM I intend to implement centralized training and decentralized execution framework. Does MASAC have this framework? Do we have single centralized critic (plus its twin) for all the agents?

MASAC in RLlib does not natively implement a single centralized critic (plus its twin) shared across all agents for centralized training and decentralized execution. Instead, each policy typically maintains its own critic(s). To implement a truly centralized critic shared by all agents, you would need to customize the model and training logic, as described in the centralized critic examples and documentation. RLlib provides examples and strategies for centralized critics, but MASAC does not provide this out-of-the-box for all agents as a single shared network.

Would you like more detail on how to implement a single centralized critic in RLlib?

Sources:

Hint: Mention @RunLLM in the post for followups.

@RunLLM Yes,

I Would you like more detail on how to implement a single centralized critic in RLlib

To implement a single centralized critic (plus its twin) for all agents in RLlib, you need to customize both the model and the training logic. RLlib’s standard multi-agent setup does not provide a shared critic out-of-the-box, but you can achieve this by following the centralized critic pattern used in the PPO examples and adapting it for SAC/MASAC.

Step-by-step approach:

  1. Custom Model:
    Create a custom model that takes as input the global state (i.e., observations and actions of all agents) and outputs a value estimate. You can refer to the centralized critic models in centralized_critic_models.py for structure. For SAC, you’ll need to implement both Q-networks (twin critics).

  2. Trajectory Postprocessing:
    Use a custom postprocess_fn in your policy to collect and concatenate all agents’ observations and actions into the batch, so the centralized critic has access to the full state and joint actions during training. See the centralized critic PPO example for how to do this: centralized_critic.py.

  3. Custom Loss:
    Override the loss function to use the centralized critic for all agents. The loss should compute Q-values using the shared critic, and all agents’ experiences should be used to update this critic.

  4. Policy Mapping:
    In your multi-agent config, map all agents to policies that use the shared critic model. You may need to use a single policy for all agents or ensure all policies share the same critic weights.

  5. Decentralized Execution:
    During inference (action selection), ensure that only the agent’s own observation is used for the actor, while the critic uses the global state only during training.

References and Examples:

This approach requires moderate customization, especially for SAC, but is supported by RLlib’s flexible model and policy APIs.

Sources:

Hint: Mention @RunLLM in the post for followups.