-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Nullables as collections #16961
RFC: Nullables as collections #16961
Conversation
elseif !any(isnull, xs) | ||
Nullable(map(f, map(unsafe_getindex, xs)...)) | ||
else | ||
throw(DimensionMismatch("expected all null or all nonnull")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a standard definition of map on options? I expected any(isnull, xs) && return Nullable()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Among languages I know of (or could find on Google) with multi-argument map...
OCaml: option type does not support multi-argument map
(most) lisps: no real option type, but cons
used in its place often. behaves like current implementation
Python: no option type
Haskell: zipWith does not apply to options
The intersection of multi-argument map and option types is quite small. This makes some degree of sense, as a lot of languages with option types are either OO (where multi-argument map often doesn't make sense) or curried.
In absence of a real precedent, I think it's a better idea to copy the behaviour of actual collections, rather than introduce special cases:
julia> map(+, Int[], [1])
ERROR: DimensionMismatch("dimensions must match")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your argument about consistency convinced me, but I think the error message should also say something like "use broadcast instead". Basically, map
shouldn't be used for lifting Nullable
, it's mostly implemented for completeness (and maybe as a way to ensure fail-fast when you don't want a null to propagate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would tend to think like @hayd on this one. In Haskell, zipWith
on lists has no problem with one list shorter than the other, it makes a lot of sense and I don't see why our map don't act similarly, both for arrays and nullables. Also, when lifting via map an operation on two or more nullables, I would naturally interpret getting a isnull back as an "error": the computation couldn't be done because of missing values. Having an error thrown when the inputs are not all isnull is redundant, as you have now to different ways to be notified of a failed computation, and you have to check both for an isnull result and for a possible thrown exception. Maybe
in Haskell can be used to encode an error at the type level, and lifting f
on Maybe
would be naturally implemented as
map2 f mx my = do
x <- mx
y <- my
return $ f x y
which returns Nothing
if one or more inputs are Nothing (that said, applying map2
on lists doesn't correspond to our multiple-list-arguments map).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have broadcast
for what you describe. I think there's been discussion somewhere about replacing map
with it. But let's not make this PR even more complex by opening this debate: the goal here is to be consistent with arrays, whether that behavior is good or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah indeed I hadn't notice the broadcast behavior. But still, I don't see why map
should be defined for Nullable
with the view that they are containers. For example NaN+1
doesn't throw an error. I'd then rather not define map
until real cases has shown what should be the semantics (or has it happened already?) Anyway, I'm not competent so won't argue more. I would just love one day some docs on map
vs broadcast
(the concept of broadcast is still relatively alien to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is really no harm in making mixed map an error for now. If needed, it can always be relaxed... though I don't like broadcasting in map.
Thanks. I agree it's OK to start with an unoptimized version, and add a fast path for safe I'll leave it to people more familiar with the |
|
||
s = 0 | ||
for x in Nullable{Int}() | ||
s += x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not seem desirable to me at all
Why should nullables be iterable? This seems like going further down making scalars iterable, but with more conflating of nullability with container emptiness. I think it's clearer if the concepts are kept separate. |
The main feature is to give |
Should be able to opt in to the syntactic sugar without making them iterable. This would lead to far more methods which aren't expecting to deal with nullables trying to operate on them, which I imagine is the intent here but I see that going wrong often and would rather prefer being explicit. Iteration would suddenly discard the possibility of null values, or silently do nothing instead of visibly flagging that a null was present. Doing nothing on a null value isn't always appropriate, encoding it as part of the behavior strikes me as wrong. |
I have no strong opinion on implementing the more general interface, but I figure e.g. Scala and Rust had good reasons to do that. @johnmyleswhite? |
I intentionally left Now that that interpretation has been dispensed with, it could be worth reconsidering whether |
@tkelman The Julia documentation is pretty explicit that Nullable types are container-like:
This was why I filed #16889—being thought of as a container type is useless if they don't behave as containers. The container property is important, and is what (functionally) distinguishes |
There are effectively four interpretations of a
The current behaviour is interpretation 2. I would argue that interpretation 3 makes more sense for Julia. |
I'm not convinced. Why? We have plenty of "container" wrapper types that don't act like iterable collections. |
Nullable started as a very minimal type "with a very minimal interface with the hope that this will encourage you to resolve the uncertainty of whether a value is missing as soon as possible" to cite @johnmyleswhite . As some point it was even discussed that |
There are a few more interpretation of
At one point, Julia could begin to offer different types with different behaviour. There could be an Here, instead of adding more container-ness to An What hasn't happened so far is to make a strong case for these types, and to prototype them in a package. I'd expect some of these to become widely used, others to remain obscure. |
"Counter-productive" sounds quite strong. Do you have any evidence of a case where different interpretations really conflict?
Precisely, a lot of work has gone into NullableArrays, and we're seeing the current limitations of |
Thanks again for making this happen, @TotalVerb! |
We should have checked the appveyor (edit: travis too!) log more carefully. This somehow didn't cause the tests to register as failing, but something strange is up here:
|
Oh dear. Let me see if I can reproduce this locally. |
I get a failure that bisects to here both on Windows and Linux from just running |
The failure isn't in the travis logs for #19745, so it could be the known typo in this PR substituting |
That seems like a good explanation to me. There's a test system bug that it was able to continue and be marked as a success here then. Manifests only when tests are run in parallel. @yuyichao any idea? |
@johnmyleswhite I recall a discussion where we agreed that nullables are not containers. However, that aspect of this --- e.g. defining But I'm very much against the
So, at the very least I find the title of this PR misleading. I think it's unacceptable for |
If I understand the current thinking correctly, I believe treating |
|
||
_broadcast_type(f, T::Type, As...) = Base._return_type(f, typestuple(T, As...)) | ||
_broadcast_type(f, A, Bs...) = Base._default_eltype(Base.Generator{ziptype(A, Bs...), ftype(f, A, Bs...)}) | ||
# nullables need to be treated like scalars sometimes and like containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The occurrence of "sometimes" here definitely raises a red flag.
I have a PR to address @JeffBezanson's concerns. I can submit it within the next hour. |
Thanks @JeffBezanson for the comments. Indeed your concerns were noted and discussed previously around #16961 (comment) and the (weak) consensus seemed to be that "nullables go in arrays, and not the other way around" as a justification for this special case, but I admit that it is not the prettiest solution. Let's see if @pabloferz has a better one. |
FWIW, I'm ok with either approach. This "taking apart" is the behavior that Erik Meijer and Eric Lippert were encouraging us to adopt, but that's because their vision of a dot operator for C# always involved flattening for all containers-of-containers and not just containers-of-nullables. For now I think we can remove this behavior since it's not very important for most use cases. |
We can get rid of this behavior for now, but that seems to imply we'll need to keep a special array type like |
@nalimilan I agree that we will need to keep Related to this point, @ScottPJones had the following comment to make;
I think future progress on this will be on the package side. |
First off, much thanks for your great work in / persistence with this pull request @TotalVerb! :)
In brief: For better extensibility and maintainability of Specifically, this pull request introduced additional type-specific functionality into Not certain whether I will have time to comment in detail prior to travel tomorrow morning. Most specific concerns I have are happily addressed by #19745 and #19787. Cross refs: Best! |
No, AFAIK @vtjnash's plan is to optimize |
This implements
map
,filter
, andbroadcast
forNullable
arguments.cc: @nalimilan @johnmyleswhite @hayd @vchuravy