Where get_action_dist() is getting called?

Hossein_Moghaddam · October 28, 2024, 3:53am

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.

I have created my custom multi-agent environment beside a custom model for each agent’s policy network that I have registered using

ray.tune.registry.register_env()

and

ModelCatalog.register_custom_model()

The action space of my environment is multi-discrete. I am receiving the following error from within get_action_dist():

NotImplementedError: Unsupported args: MultiDiscrete([490 490 490 490 490]) None

I have put a breakpoint at the beginning of get_action_dist() to go through it and find what is wrong with my code but it is never reached and I’m confused about where it is called when I call config.build() or PPO(config = config)(I’m using PPO). None here refers to the dist_type, but it should be MultiDiscrete. I’m suspicious it is because of the custom environment or policy networks, but I need first to verify what is passed through get_action_dist() to correct my config, my environment, my policies, or any combination of them.

Lars_Simon_Zehnder · October 28, 2024, 4:32pm

@Hossein_Moghaddam, thanks for posting and welcome to the forum.

First of all, I kindly suggest to you to switch to our new API stack that provides better and simpler APIs and will replace the old stack in the near future. You can find a migration guide here.

The get_action_dist() in the old stack is called either when training in the PPO loss or when in exploration/inference in compute_single_action.

The fact that your debugger cannot break at your breakpoints is due to the fact that you possibly run Ray in cluster mode.

Either you turn to local mode, i.e. ray.init(local_mode=True)
Or you turn to the Ray debugger that allows you to debug in cluster mode (see here)

Hossein_Moghaddam · October 28, 2024, 10:32pm

Thanks for your prompt reply.
I used ray.init(local_mode=True) to run Ray in local mode but it doesn’t solve the issue. I noticed it is called within build() before I start training. I also could not succeed in breaking the code execution of the cartpole_ppo example inside get_action_dist(), I know the reason could be exploiting the new API stack.
Currently, I prefer to stick to the old API stack as I’m using custom policy models and the RL module is experimental.
Here, I’m sharing a piece of code that throws the error that I mentioned. It doesn’t start training and crash before that.

RLlib 2.38 and python 3.12.7.

import gym
from gym.spaces import MultiDiscrete, Box
import numpy as np
from ray.rllib.env import EnvContext, MultiAgentEnv


class MultiDiscreteEnv(MultiAgentEnv):
    def __init__(self, config: EnvContext):
        super().__init__()
        
        # Define a MultiDiscrete action space with two components
        # First component ranges from 0-4, and second component from 0-2
        self.action_space = MultiDiscrete([5, 3])
        
        # Observation space: an array of size 3 with continuous values
        self.observation_space = Box(low=-1.0, high=1.0, shape=(3,), dtype=np.float32)
        
        # Set an initial state for demonstration
        self.state = np.zeros(3, dtype=np.float32)
        self.step_count = 0

    def reset(self):
        # Reset the state and step count
        self.state = np.zeros(3, dtype=np.float32)
        self.step_count = 0
        return self.state

    def step(self, action):
        # Action is a tuple (e.g., (3, 1)) of actions in each discrete component
        action_0, action_1 = action
        
        # Simplified reward calculation
        reward = 1.0 if action_0 == 3 and action_1 == 1 else 0.0
        
        # Update the state as a function of the action (for demonstration)
        self.state = np.array([action_0 / 5.0, action_1 / 2.0, 0.5], dtype=np.float32)
        
        # Increment the step count
        self.step_count += 1
        done = self.step_count >= 10  # Terminate episode after 10 steps
        
        return self.state, reward, done, {}

# Register this environment with gym
import gym
from ray.tune.registry import register_env

def env_creator(env_config):
    return MultiDiscreteEnv(env_config)

register_env("MultiDiscreteEnv-v0", env_creator)

import ray
from ray import tune
from ray.rllib.algorithms.ppo import PPO

# Initialize Ray
ray.init(ignore_reinit_error=True, local_mode=True)

# Define the PPO configuration
config = {
    "env": "MultiDiscreteEnv-v0",
    "framework": "torch",  # Use TensorFlow or change to "torch" for PyTorch
    "num_gpus": 0,      # Change to 1 if you have a GPU available
    "num_workers": 1,   # Parallelism, set to >1 to use multiple CPU cores
    "env_config": {},   # Any environment-specific parameters go here
    "model": {
        "fcnet_hiddens": [64, 64],  # Fully connected layers of the policy network
        "fcnet_activation": "relu", # Activation function for each layer
    },
    "multiagent": {
        "policies": {
            "default_policy": (None, MultiDiscreteEnv({}).observation_space, MultiDiscreteEnv({}).action_space, {})
        },
        "policy_mapping_fn": lambda agent_id: "default_policy",
    },
    "rollout_fragment_length": 200,
    "train_batch_size": 4000,
    "sgd_minibatch_size": 128,
    "num_sgd_iter": 10,
    "lr": 5e-4,  # Learning rate
    "gamma": 0.99,  # Discount factor
}
trainer = PPO(config=config)
# Run the PPO Trainer
tune.run(
    trainer,
    config=config,
    stop={"episode_reward_mean": 5},  # Stop criteria for demonstration
    local_dir="./ray_results",  # Directory to save results
)

mannyv · October 29, 2024, 3:41am

Hi @Hossein_Moghaddam,

Which gym package are you using? RLlib is only supporting gymnasium in version 2.38.

Also if you want to do the debugging with local_mode=True you also need to set num_workers=0.

Hossein_Moghaddam · October 30, 2024, 12:25am

Thanks a lot @mannyv. I was using gym mistakenly. Changing it to gymnasium solved the issue.
It could be a good idea to raise an error to distinguish between them when executing the code. I couldn’t imagine it was the source of the problem by the error message I was recieving.

Topic		Replies	Views
Passing 'custom_action_dist' RLlib	5	398	April 10, 2023
How to choose the action dist for a custom model with a Tuple action space? RLlib	5	825	May 15, 2022
Actions created by Policy being modified before input to environment RLlib	4	289	March 15, 2023
Simple multi agent setup with action masking problems RLlib	0	193	September 16, 2024
Questions and Confusion: Getting started with RLlib Configure Algorithm, Training, Evaluation, Scaling	0	41	February 19, 2025

Where get_action_dist() is getting called?

Related topics