-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising behaviors in array initialization #17
Comments
I don't think the behavior with There is an issue with passing in your own arrays to |
maybe |
Interesting! This behavior is undocumented though. Here's what I found by looking at discussions in Julia issues:
So what about replacing |
I think the behavior with
I agree that would be safer. You could also consider that if the sentinel isn't passed explicitly, then the caller doesn't know it, so you should check whether it's in the data, and if so choose a different random sentinel. OTC when the sentinel is explicitly passed, then the caller knows that such values will be treated as missing if they appear in the data. Why is |
So we can initialize the SentinelArray faster via Yeah, I like the idea of doing it differently if the user is passing the sentinel or not. Let me look into what that change looks like. |
It's not terrible, but it seems wasteful to conflate the I also see little value in having two ways to do the same thing, one being the "official" way of the core language, and another way that encourages to use an anti-pattern: abusing the It seems clear that the current design is due to historical reasons (there was no |
I don't think that's the case at all. AFAICT the reason why The same applies to CategoricalArrays, where we also need to fill reference codes with a valid value (otherwise we would have to check that they are in the expected range all the time which would require two comparisons instead of one). |
I confirm that this is also my understanding. |
I agree 100% (by "current design" I meant that of SentinelArrays, not Base). I'm not arguing against the behavior in Base. And I see now that the behavior of SentinelArrays for |
Your comments are valuable - we just try very hard to maintain consistency with Base where possible so it is better to discuss things in detail. |
The documentation says:
But the constructor uses a constant default value. For example the default sentinel for
Int
is -1, soSentinelArray([-1, -1, -1])
creates an array with three missing values. From the documentation I expected an array of three-1
, and a random sentinel different from -1.The behavior with
undef
is also surprising: The documentation refers to the "standard undef pattern". As I understand,undef
is meant to create uninitialized arrays, for cases where initialization is useless and undesired for performance reasons. ButSentinelVector{Int}(undef, 3)
actually initializes the array with 3 missing values. I would suggest removing theundef
argument from the constructors (so a vector full of 3 missing values can be created withSentinelVector{Int}(3)
). A later version could re-introduce the undef pattern with the correct semantics...The text was updated successfully, but these errors were encountered: