AssertionError: Discrete(33) | MASAC with continuous and discrete agents

Tory · November 8, 2025, 8:29am

1. Severity of the issue: (select one)
High: Completely blocks me.

2. Environment:

Ray version: 2.48.0
Python version:
OS:
Cloud/Infrastructure:
Other libs/tools (if relevant):

3. What happened vs. what you expected:

Expected:
Actual:

I am working on a problem with 4 agents. I want to use MASAC for this purpose. Agent1 has discrete action space. Agent2 to Agent4 have continuous action spaces. I am facing the “AssertionError: Discrete(33)“ error.

I am wondering if I can have agents with separate discrete and continuous actions under MASAC? If not, what will be the ideal workaround?

RunLLM · November 8, 2025, 8:30am

RLlib’s MASAC (multi-agent SAC) now supports both discrete and continuous action spaces, including in mixed multi-agent environments, as of recent updates. You can assign different action spaces to each agent by specifying them in the policies dict of the multi-agent config, and RLlib will infer and handle the spaces accordingly. Example scripts such as multi_agent_pendulum_sac.py and different_spaces_for_agents.py demonstrate this setup, where agents can have different action and observation spaces, including mixing discrete and continuous actions.

If you encounter an AssertionError: Discrete(33), ensure your environment and policy mapping are correctly set up, and that each policy’s action space matches the agent’s. RLlib should handle this automatically if you provide the correct spaces per agent in the config. If issues persist, double-check that your Ray version is up-to-date and that you are using the new API stack, as older versions or misconfigurations may not support mixed action spaces in MASAC properly. Would you like a step-by-step example configuration?

Sources:

Hint: Mention @RunLLM in the post for followups.

RunLLM · November 8, 2025, 9:18am

You can find an official RLlib example of a multi-agent environment where some agents have discrete and others have continuous action spaces in different_spaces_for_agents.py. In this script, agent0 uses a discrete action space, while agent1 uses a different (can be continuous) action space, demonstrating how to configure and run such mixed-space setups.

Would you like a code snippet or further explanation of how to adapt this example to your use case?

Sources:

different_spaces_for_agents.py

Hint: Mention @RunLLM in the post for followups.

RunLLM · November 20, 2025, 12:50pm

MASAC in RLlib does not natively implement a single centralized critic (plus its twin) shared across all agents for centralized training and decentralized execution. Instead, each policy typically maintains its own critic(s). To implement a truly centralized critic shared by all agents, you would need to customize the model and training logic, as described in the centralized critic examples and documentation. RLlib provides examples and strategies for centralized critics, but MASAC does not provide this out-of-the-box for all agents as a single shared network.

Would you like more detail on how to implement a single centralized critic in RLlib?

Sources:

Hint: Mention @RunLLM in the post for followups.

RunLLM · November 20, 2025, 1:01pm

To implement a single centralized critic (plus its twin) for all agents in RLlib, you need to customize both the model and the training logic. RLlib’s standard multi-agent setup does not provide a shared critic out-of-the-box, but you can achieve this by following the centralized critic pattern used in the PPO examples and adapting it for SAC/MASAC.

Step-by-step approach:

Custom Model:
Create a custom model that takes as input the global state (i.e., observations and actions of all agents) and outputs a value estimate. You can refer to the centralized critic models in centralized_critic_models.py for structure. For SAC, you’ll need to implement both Q-networks (twin critics).
Trajectory Postprocessing:
Use a custom postprocess_fn in your policy to collect and concatenate all agents’ observations and actions into the batch, so the centralized critic has access to the full state and joint actions during training. See the centralized critic PPO example for how to do this: centralized_critic.py.
Custom Loss:
Override the loss function to use the centralized critic for all agents. The loss should compute Q-values using the shared critic, and all agents’ experiences should be used to update this critic.
Policy Mapping:
In your multi-agent config, map all agents to policies that use the shared critic model. You may need to use a single policy for all agents or ensure all policies share the same critic weights.
Decentralized Execution:
During inference (action selection), ensure that only the agent’s own observation is used for the actor, while the critic uses the global state only during training.

References and Examples:

centralized_critic.py (PPO, but the pattern applies)
centralized_critic_models.py
RLlib centralized critic documentation (see “Implementing a Centralized Critic”)

This approach requires moderate customization, especially for SAC, but is supported by RLlib’s flexible model and policy APIs.

Sources:

Hint: Mention @RunLLM in the post for followups.

RunLLM · February 6, 2026, 1:27pm

It could be possible that MASAC in RLlib is designed for standard MDPs, but it can also be applied to POMDPs if you use memory-based models (like LSTMs) or augment observations appropriately. RLlib itself does not restrict MASAC to only MDPs or POMDPs—the distinction depends on your environment and model choice, not the algorithm implementation itself.

Though I’m not entirely certain, it might be that for POMDPs, you need to use recurrent models or custom observation preprocessing, as discussed in RLlib forums and examples. Would you like more detail on handling POMDPs with MASAC?

Sources:

https://discuss.ray.io/t/approaching-a-pomdp-problem-with-rllib/9508
https://discuss.ray.io/t/global-optima-with-centralized-critic-basic-understanding/1478
Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links.

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Mixed Action Space - SAC RLlib	1	78	November 3, 2025
Mutiagent - Different action space for different agents RLlib	8	1940	August 25, 2022
Does RLlib algorithm support both discrete and continuous action spaces simultaneously? RLlib	7	1772	February 22, 2023
[RLlib] Why some algorithms do not suppport multiagent or discrete/continuous action space? RLlib	1	521	January 25, 2021
PPO centralized critic example with more than two agents RLlib	4	1941	October 19, 2021

AssertionError: Discrete(33) | MASAC with continuous and discrete agents

Related topics