Right way to use tuple action space

Ofir_Abu · September 22, 2021, 1:40pm

Hi there, I’m using a custom environment with a tuple (gym space) action space.
TL;DR - I’m having trouble about how should I construct the output of the model from the forward function.

Specifically:
my action space is defined as:

Tuple((DiscreteWithDType(9, dtype=np.uint8), DiscreteWithDType(9, dtype=np.uint8)))

And I don’t know how to output the value in the forwad pass, is there some example to look at?

Ofir_Abu · September 23, 2021, 8:26am

I know it’s not enough time since the last update but time is of the essence

I just need an example of the shape and type of the output forward model for this kind of action space (any tupled action space will do).

mannyv · September 23, 2021, 12:48pm

Hi @Ofir_Abu,

Perhaps looking through this example will help you figure out what you need.

github.com

ray-project/ray/blob/698b4eeed3e2699f3181baeadecf966702a55eaf/rllib/examples/two_step_game.py

"""The two-step game from QMIX: https://arxiv.org/pdf/1803.11485.pdf

Configurations you can try:
    - normal policy gradients (PG)
    - contrib/MADDPG
    - QMIX

See also: centralized_critic.py for centralized critic PPO on this game.
"""

import argparse
from gym.spaces import Dict, Discrete, Tuple, MultiDiscrete
import os

import ray
from ray import tune
from ray.tune import register_env, grid_search
from ray.rllib.env.multi_agent_env import ENV_STATE
from ray.rllib.examples.env.two_step_game import TwoStepGame
from ray.rllib.policy.policy import PolicySpec

This file has been truncated. show original

mannyv · September 23, 2021, 12:54pm

@Ofir_Abu

This function here should also help:

github.com

ray-project/ray/blob/698b4eeed3e2699f3181baeadecf966702a55eaf/rllib/models/catalog.py#L295-L329

    
      
          def get_action_shape(action_space: gym.Space,
                               framework: str = "tf") -> (np.dtype, List[int]):
              """Returns action tensor dtype and shape for the action space.
          
          
    Args:
                  action_space (Space): Action space of the target gym env.
                  framework (str): The framework identifier. One of "tf" or "torch".
          
          
    Returns:
                  (dtype, shape): Dtype and shape of the actions tensor.
              """
              dl_lib = torch if framework == "torch" else tf
          
          
    if isinstance(action_space, Discrete):
                  return action_space.dtype, (None, )
              elif isinstance(action_space, (Box, Simplex)):
                  return dl_lib.float32, (None, ) + action_space.shape
              elif isinstance(action_space, MultiDiscrete):
                  return action_space.dtype, (None, ) + action_space.shape
              elif isinstance(action_space, (Tuple, Dict)):

This file has been truncated. show original

Ofir_Abu · September 23, 2021, 1:15pm

Thank you! The first example helps me, but I specifically have trouble with a custom tf model, I don’t know how to define the type and shape of the forward pass.

I guess I will debug a simple case of non-costum model to understand it, but if someone has a reference that would be a great help

rusu24edward · September 23, 2021, 3:43pm

@Ofir_Abu what is DiscreteWithDType? I’m not familiar with this. I don’t believe it’s a gym space…

Ofir_Abu · September 23, 2021, 4:04pm

Correct, but it’s a compatible wrapper to the official Discrete space of gym, basically has some extra dtype casting functions.

Does someone have an example of the forward function’s output in a similar case?

Ofir_Abu · September 23, 2021, 5:43pm

thanks again @mannyv !
Is there an easy way to how the output of the forward pass is constructed? specifically - how do I debug it?

sven1977 · September 24, 2021, 10:06am

Hey @Ofir_Abu , your Tuple space results in a MultiActionDistribution to be chosen by RLlib as the model’s output (the model parameterizes this distribution type and outputs an according number of nodes). The output values of the model are then split inside this distribution, according to the individual sub-spaces (2x DiscreteWDtype) and then actions will be sampled from these two spaces individually using the logits produced by your model.

You can debug into your forward pass by setting a breakpoint in e.g. rllib/models/torch/torch_action_dist::TorchMultiActionDistribution::sample() (of the respective tf version) AND setting local_mode=True in your call to ray.init().

Ofir_Abu · September 24, 2021, 11:30am

Sounds great thank you for the explanation about the 2 distributions.
I will try now to debug it and edit the message with the results

Topic		Replies	Views
Rllib with Tuple action space RLlib	1	570	December 14, 2022
RLlib and gym.space RLlib	4	712	November 14, 2021
How to choose the action dist for a custom model with a Tuple action space? RLlib	5	845	May 15, 2022
Example running RL on tuple space RLlib	7	507	February 4, 2022
Discrete tuple action space for simple Q RLlib	4	1299	October 14, 2021

Right way to use tuple action space

Related topics