[Question] Flatten Discrete to Box is problematic #3139

rusu24edward · 2022-10-28T00:27:48Z

Question

Flattening Discrete space to Box space may be problematic. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:

0 maps to [1, 0, 0]
1 maps to [0, 1, 0]
3 maps to [0, 0, 1]

When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:

[0, 0, 0],
[0, 0, 1], *
[0, 1, 0], *
[0, 1, 1],
[1, 0, 0], *
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]

Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses np.nonzero(x)[0][0], and here’s at table of what the above arrays map to:

+ ------------------ + ---------------- + --------------------------------------------- +
| In Flattened Space | np.nonzero(x)[0] | np.nonzero(x)[0][0] (aka discrete equivalent) |
+ ------------------ + ---------------- + --------------------------------------------- +
| 0, 0, 0            | Error            | Error                                         |
| 0, 0, 1            | [2]              | 2                                             |
| 0, 1, 0            | [1]              | 1                                             |
| 0, 1, 1            | [1, 2]           | 1                                             |
| 1, 0, 0            | [0]              | 0                                             |
| 1, 0, 1            | [0, 2]           | 0                                             |
| 1, 1, 0            | [0, 1]           | 0                                             |
| 1, 1, 1            | [0, 1, 2]        | 0                                             |
+ ------------------ + ---------------- + --------------------------------------------- +

Implications

Obviously, [0, 0, 0] will fail because there is no nonzero.
Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I'm very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when Discrete(n), n is large. Am I missing something here?

Solution

This is unique to Discrete spaces. Instead of mapping to a one-hot encoding we could just map to a box of a single element with the appropriate range. Discrete(n) maps to Box(0, n-1, (1,), int) instead of Box(0, 1, (n,), int).

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-10-28T08:55:57Z

Hey, we just launched gymnasium, a fork of Gym by the maintainers of Gym for the past 18 months where all maintenance and improvements will happen moving forward. Could you please move this over to the new repo?

If you'd like to read more about the story behind the backstory behind this and our plans going forward, click here.

rusu24edward · 2022-10-31T18:43:21Z

Moved it here

rusu24edward closed this as completed Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Flatten Discrete to Box is problematic #3139

[Question] Flatten Discrete to Box is problematic #3139

rusu24edward commented Oct 28, 2022

pseudo-rnd-thoughts commented Oct 28, 2022

rusu24edward commented Oct 31, 2022

[Question] Flatten Discrete to Box is problematic #3139

[Question] Flatten Discrete to Box is problematic #3139

Comments

rusu24edward commented Oct 28, 2022

Question

Implications

Solution

pseudo-rnd-thoughts commented Oct 28, 2022

rusu24edward commented Oct 31, 2022