Sampling from flattened space

rusu24edward · October 31, 2022, 6:51pm

How severe does this issue affect your experience of using Ray?

None: Just asking a question out of curiosity

Question about how actions are sampled

In implementing an RL algorithm that uses gym’s flatten function, I ran into an error that I don’t get when using RLlib, so I would like to know how RLlib handles this problem. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:

0 maps to [1, 0, 0]
1 maps to [0, 1, 0]
3 maps to [0, 0, 1]

When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses np.nonzero(x)[0][0], and here’s at table of what the above arrays map to:

+ ------------------ + ---------------- + --------------------------------------------- +
| In Flattened Space | np.nonzero(x)[0] | np.nonzero(x)[0][0] (aka discrete equivalent) |
+ ------------------ + ---------------- + --------------------------------------------- +
| 0, 0, 0            | Error            | Error                                         |
| 0, 0, 1            | [2]              | 2                                             |
| 0, 1, 0            | [1]              | 1                                             |
| 0, 1, 1            | [1, 2]           | 1                                             |
| 1, 0, 0            | [0]              | 0                                             |
| 1, 0, 1            | [0, 2]           | 0                                             |
| 1, 1, 0            | [0, 1]           | 0                                             |
| 1, 1, 1            | [0, 1, 2]        | 0                                             |
+ ------------------ + ---------------- + --------------------------------------------- +

Implications

Obviously, [0, 0, 0] will fail because there is no nonzero.
Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I’m very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when Discrete(n), n is large.

I never see this error when running with RLlib, so it seems to do something smarter than a raw sampling from the flattened action space. Can someone point me to more information on how RLlib randomly samples the spaces?

rusu24edward · November 10, 2022, 6:40pm

Not a high priority question, but it would be nice if someone has some free time to get eyes on this.

arturn · November 15, 2022, 9:16am

Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0

Where does the other eight map to? The random samples don’t appear to be random. Can you set up a reproduction script?

I never see this error when running with RLlib, so it seems to do something smarter than a raw sampling from the flattened action space.

We sample from spaces in many places. I’m not sure which one you are referring to here.
In a one-hot encoding, many of the cases you list are not valid encodings though.
For example [0, 1, 1] is not one-hot, but “two hot”.

Cheers

rusu24edward · November 17, 2022, 4:46pm

Thanks for the response, @arturn

When a gym space is flattened, it is converted to a Box. The flattened space does not store information about what it was before being flattened; particularly, it doesn’t retain information about one-hot encoding. Flattening Discrete(3) becomes Box(0, 1, (3,), int). In Discrete(3), there are three possible points: 0, 1, or 2. On the other hand, Box(0, 1, (3,), int) has 8 possible points:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only the 3 that I’ve starred are valid. Thus, if random samples are generated from the flattened space instead of the original space, then there will be invalid points, as you said above. If you look in the table I created above, you can see how each of the points from the Box space will map to the Discrete space during the unflatten process. The distribution is skewed, and one of the points will fail completely.

All this to say, gym spaces do not guarantee flatten -> sample -> unflatten.

Here’s an example script:

from gym.spaces import Discrete, Box
from gym.spaces.utils import flatten_space, unflatten

def to_string(array):
    return ''.join([str(i) for i in array])

discrete_space = Discrete(3)
discrete_sample_counter = {0: 0, 1: 0, 2:0}
for _ in range(10000):
    sample = discrete_space.sample()
    discrete_sample_counter[sample] += 1

print(discrete_sample_counter)

flattened_space = flatten_space(discrete_space)
flattend_sample_counter = {
    '000': 0,
    '001': 0,
    '010': 0,
    '011': 0,
    '100': 0,
    '101': 0,
    '110': 0,
    '111': 0
}
unflattened_sample_counter = {0: 0, 1: 0, 2:0, 'error': 0}
for _ in range(10000):
    sample = flattened_space.sample()
    flattend_sample_counter[to_string(sample)] += 1

    try:
        unflattened_sample = unflatten(Discrete(3), sample)
        unflattened_sample_counter[unflattened_sample] += 1
    except IndexError:
        unflattened_sample_counter['error'] += 1

print(flattend_sample_counter)
print(unflattened_sample_counter)

# Output:
discrete_sample_counter >> {0: 3462, 1: 3333, 2: 3205}
# ^ Notice equal distribution among the three options, each of them is chosen about 1/3 of the time
flattend_sample_counter >> {'000': 1241, '001': 1266, '010': 1211, '011': 1245, '100': 1251, '101': 1279, '110': 1284, '111': 1223}
# ^ Notice equal distribution among the eight options, each of them is chosen about 1/8 of the time
unflattened_sample_counter >> {0: 5037, 1: 2456, 2: 1266, 'error': 1241}
# ^ Notice skewed distribution among the three options. As in the table I showed above, 1/2 of the samples are for 0, 1/4 are for 1, 1/8 for 2, and the other 1/8 are errors.

I’ve not seen the IndexError error while working with RLlib, so I’m curious to know more about how RLlib generates samples. It doesn’t seem to be directly sampling from the flattened space. Does it sample the original space before the pre-process flattening? I’m concerned about the skewing problem too.

FYI, I’ve brought this up to gym, and you can see the discussion here.

arturn · November 30, 2022, 7:00pm

The line that we use to one-hot encode observations reads gym.spaces.utils.flatten(self._obs_space, observation).astype(np.float32).
When sampling from Categorical (Discrete for that matter), you can simply take a random sample and use argmax.

rusu24edward · December 5, 2022, 8:23pm

Thanks for the response and for the links. Flattening the point is fairly straightforward, and what I’m bringing up relates to unflattening a sample. I dug around the code a bit, and it looks like the original structure of the space is maintained and utilized for unflattening, so that’s good.

quester · June 1, 2023, 8:18pm

@rusu24edward sorry for necro posting but I’m running in to this exact same issue wanted to ask if you ever got to the bottom of it. As you say and from several of the older Gym issues, it looks like gym really does not guarantee flatten -> sample -> unflatten!

rusu24edward · June 28, 2023, 7:51pm

I had this discussion with the gymnasium team. Warnings and errors were added to the gymnasium code. Nothing more we can do about it.

Topic		Replies	Views
Action space for choosing a sequence of items from a bigger sequence RLlib	6	315	March 1, 2023
How to flatten space when action masking? RLlib	7	1648	September 1, 2023
RLlib and gym.space RLlib	4	725	November 14, 2021
Arbitrary action/observation space RLlib	3	466	July 22, 2021
Ray Spaces Support RLlib	2	22	July 15, 2025

Sampling from flattened space

Question about how actions are sampled

Implications

Related topics