-
-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Flatten Discrete box potentially problematic #102
Comments
Thanks for moving the PR to Gymnasium. My initial ideas for solving this was changing the flattened Your solution doesn't seem to change anything, just changing There are two options to solve this
Do you have any other ideas? |
As for context, many RL algorithms utilize randomness in their process, such as filling a buffer with data based on random actions at the beginning of training, or taking random actions to explore. Some implementations use To clarify, my solution is not to just change Discrete to Box. This is what flatten already does. My suggestion is to set up the Box differently. Right now, its set up like so:
I suggest change it to this
Why do this at all instead of just keeping it as a Discrete space? Frankly, I think that's a great question, and it's not clear to me why the current implementation flattens to a n-element array instead of using Discrete as is. It seems that he purpose is to represent it as a one-hot encoded vector, and if so, then that needs to be enforced is the resulting space. Otherwise, flattening Discrete should not do anything. I think this argument extends to MultiDiscrete and MultiBinary spaces. As you say, it may not make sense to have a new space for a Flattened Discrete (and MultiDiscrete) space since it's only used in this one case. It may be okay not to guarantee that sampling from the flattened space is "path-independent", but does present a "gotcha" that some developers (such as myself) may fall into when designing our RL frameworks. |
My suggestion would be to require user to flatten their environments such that observations / actions are np.ndarray / int. |
Mark, thank you for the thought you have given to this ticket so far. For the software I develop, I take the mindset that anything I provide to my users should work as expected. When something does not work as expected, I see it as either a bug or an opportunity to clarify expectations. I believe the question that I have raised reveals a potential flaw in the design of gym's spaces. Based on the language used in the flatten wrapper, it seems that flattening a discrete (and also multidiscrete) space should result in a one-hot-encoded vector, but the one-hot-encoding is not enforced in the way that object is then used (since it has just been turned into a Box). Personally, I would classify this as a bug, and I would either fix it with some clever redesign or update the documentation to make it abundantly clear that the flatten wrapper is limited. Telling my users to pre-flatten their own spaces is not really an option for me. I like all the gym spaces, I want to support all of them, and I don't want to burden my users with implementation details that makes their environments messier. This discussion has helped inform me on how I should change my own flatten wrapper to work with gym spaces more cleanly. I hope that the gymnasium team can see it as an opportunity to look at the gym spaces from a new perspective and make changes as needed. |
Thanks for the response. I could imagine adding a The best idea I have is What do you think of this solution? If you are happy with it and @RedTachyon agrees (he might take a couple of days due to being at a conference currently). We can make a PR and include the change in v27 |
This sounds good to me. You may need to flatten to Box instead of MultiBinary in order to handle nested spaces. In fact, one could make the argument that Discrete itself really is just MultiBinary with one-hot encoding turned on. |
I'm not yet completely sure what we want to do here. I'd definitely be opposed to adding an extra space just for the right flattening, unless we completely change our philosophy with regards to the spaces. If this one change makes it so that we can always ensure the reversibility of At the same time, the whole idea of My interpretation of this functionality is more or less "I don't care what this fancy space is, I want a fixed sized float32 vector for my NN". In which case any reversibility fails due to the dtypes. But we don't actually do any automatic type conversion. I also feel like flattening into one-hot is much better on the algorithmic side. If you want to "flatten" a discrete action, you definitely don't want to keep the ordinal structure -- i.e. action 1 isn't between action 0 and action 2 in any meaningful sense, so a one-hot embedding is likely to work much better. |
@rusu24edward Would you be able to make a test that checks the basic idea of |
I don't understand this, can you say a bit more about it? @pseudo-rnd-thoughts I will take a look |
Some issues right off the bat:
Totally agree, and yet, is this even tenable? It's curious that this issue has not come up sooner in gym's history. Several RL libraries make heavy use of gym's spaces; perhaps they're just sampling more intelligently? I bring this up because I am curious to know how many libraries make use of gym's flatten. I asked RLlib specifically to see what they did. This issue should appear anytime you unflatten the action output from a NN. Are users designing their NN with a softmax on some part of the output? Is anyone actually using the flatten an unflatten features? How are they getting around this problem? I'm posting these questions since I think y'all would have a better idea of the context for flatten since you engage with the users regularly. |
Let's say you have an unknown space. It might be The moment you cast to float32, you lose reversibility if you try to do
I was not aware that this is something that people would do, probably mainly because complex action spaces that would require it are rather rare. And if you use a space like this, you're probably better off writing your own utilities and definitely not use |
Good point, definitely complicates the situation... It seems like flatten was not meant to be followed up with a sample. One should sample the space, then flatten the sample. Not sure how this is maintained on the other side of the NN when the action comes out and it is unflattened. But as you said, this is something that users would enforce themselves. Maybe the solution here is to add some documentation guiding the users to sample before flattening and warning them about the shortcomings of unflattening? |
It seems that it is not possible to unflatten a flattened sample. |
Yes, I can do this. I didn't want the extra nuance, but it's fine.
If I'm just sampling, then yes, I can do everything in the original space. However, if it's output from a NN or some other function and its in the flattened state, then no, I will need to unflatten it directly. And in this case, I'll need to design the NN or function to enforce the structure. |
Just for conciseness, here's an example script: from gym.spaces import Discrete, Box
from gym.spaces.utils import flatten_space, unflatten
def to_string(array):
return ''.join([str(i) for i in array])
discrete_space = Discrete(3)
discrete_sample_counter = {0: 0, 1: 0, 2:0}
for _ in range(10000):
sample = discrete_space.sample()
discrete_sample_counter[sample] += 1
print(discrete_sample_counter)
flattened_space = flatten_space(discrete_space)
flattend_sample_counter = {
'000': 0,
'001': 0,
'010': 0,
'011': 0,
'100': 0,
'101': 0,
'110': 0,
'111': 0
}
unflattened_sample_counter = {0: 0, 1: 0, 2:0, 'error': 0}
for _ in range(10000):
sample = flattened_space.sample()
flattend_sample_counter[to_string(sample)] += 1
try:
unflattened_sample = unflatten(Discrete(3), sample)
unflattened_sample_counter[unflattened_sample] += 1
except IndexError:
unflattened_sample_counter['error'] += 1
print(flattend_sample_counter)
print(unflattened_sample_counter)
# Output:
discrete_sample_counter >> {0: 3462, 1: 3333, 2: 3205}
# ^ Notice equal distribution among the three options, each of them is chosen about 1/3 of the time
flattend_sample_counter >> {'000': 1241, '001': 1266, '010': 1211, '011': 1245, '100': 1251, '101': 1279, '110': 1284, '111': 1223}
# ^ Notice equal distribution among the eight options, each of them is chosen about 1/8 of the time
unflattened_sample_counter >> {0: 5037, 1: 2456, 2: 1266, 'error': 1241}
# ^ Notice skewed distribution among the three options. As in the table I showed above, 1/2 of the samples are for 0, 1/4 are for 1, 1/8 for 2, and the other 1/8 are errors. |
@rusu24edward As we discussed above, it doesn't seem possible for accurately unflattering a flattened space's sample. |
@pseudo-rnd-thoughts Yup, just wanted to provide an example script in case anyone else comes by this post. Sure, I can add that |
Question
Flattening Discrete space to Box space may be problematic. The flatten wrapper converts Discrete to Box as a one-hot encoding. Suppose the original space is Discrete(3), then:
When we sample the action space for random actions, it samples the Box, which can produce any of the eight combination of 0s and 1s in a three-element array, namely:
Only three of these eight that I’ve starred are useable in the strict sense of the mapping. The unflatten function for a Discrete space uses
np.nonzero(x)[0][0]
, and here’s at table of what the above arrays map to:Implications
Obviously, [0, 0, 0] will fail because there is no nonzero.
Importantly, only one eighth of the random samples will map to 2. One fourth will map to 1, and one half will map to 0. This has some important implications on exploration, especially if action 2 is the “correct action” throughout much of the simulation. I'm very curious why I have not seen this come up before. This type of skewing in the random sampling can have major implications in the way the algorithm explores and learns, and the problem is exacerbated when
Discrete(n), n is large
. Am I missing something here?Solution
This is unique to Discrete spaces. Instead of mapping to a one-hot encoding we could just map to a box of a single element with the appropriate range. Discrete(n) maps to Box(0, n-1, (1,), int) instead of Box(0, 1, (n,), int).
The text was updated successfully, but these errors were encountered: