Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The predicate problem #7

Closed
davidanthoff opened this issue May 14, 2017 · 8 comments
Closed

The predicate problem #7

davidanthoff opened this issue May 14, 2017 · 8 comments

Comments

@davidanthoff
Copy link

davidanthoff commented May 14, 2017

One of the general issues we discovered in discussing the whole lifting story over the last months was how to deal with predicates. I thing broadly speaking @nalimilan argued that lifted versions of predicates should return Nullable{Bool}, whereas I argued they should return Bool. I think at the end of the day I'm convinced that both cases have merit, and that it depends on the context of the predicate use which semantic is better.

I think this problem is pretty much solved with the approach I've taken in DataValues.jl: if one uses the call-site lifting operator . one always gets a return value that is wrapped in a DataValue, i.e. .== returns DataValue{Bool}. This essentially covers the use-case that @nalimilan had in mind. If there is a useful way to implement a predicate on DataValues that returns a pure Bool, that semantics can just be implemented as another method for the predicate. For example, I did that for all the comparison operators. So == will return a Bool. That covers the use-case that I had originally in mind.

Is there a plan to handle both of these use-cases with the Union{T,Null} proposal?

@nalimilan
Copy link
Member

We haven't really considered this yet. I think one solution would be to have query frameworks to automatically turn == into a custom operator returning null when one of the operands is null. AFAICT this is more or less equivalent to what Query.jl does.

The option of using .== for that has some appeal, but there's still the big downside that it doesn't work when applied to arrays.

@quinnj
Copy link
Member

quinnj commented May 15, 2017

I guess maybe I need to go back and re-read one of the discussions, but I can't really see where this is a practical issue. Right now, == returns a Bool, true for null == null and false for any other combination of value & null. What are the cases where you really want 3-valued boolean logic and it makes things simpler than just using == and isnull?

@davidanthoff
Copy link
Author

We haven't really considered this yet. I think one solution would be to have query frameworks to automatically turn == into a custom operator returning null when one of the operands is null. AFAICT this is more or less equivalent to what Query.jl does.

Query.jl right now has zero special casing for == (or any other function one might use in a @select or any other statement), it just relies entirely on my DataValue implementation, so there is no special lifting behavior inside a query that would differ in any way from operating on two scalar DataValues outside of a query. I feel that preserving that kind of behavior is really important for the overall consistency of Query.jl, so we would have to find a solution that doesn't require Query.jl to replace operators inside queries.

The option of using .== for that has some appeal, but there's still the big downside that it doesn't work when applied to arrays.

There is a proposal on the table that would solve this: JuliaLang/julia#20502.

Right now, == returns a Bool, true for null == null and false for any other combination of value & null.

If we had to pick one semantics, I would go with those, but I would prefer to expose both semantics to users. Note also that the current implementation of < etc. returns null (introduced here), and this discussion essentially applies to any predicate, i.e. including <. But that of course could also be change to return Bool.

Here is the example that shows why one might want to have both:

q = @from i in df begin
    @where i.a == 4
    @select i.a == 4
    @collect
end

Essentially we would like the comparison in the @where clause to return a Bool, but the comparison in the @select clause should return a DataValue{Bool}. I have a branch for Query.jl where that is possible using the DataValue type, the query would look like this:

q = @from i in df begin
    @where i.a == 4
    @select i.a .== 4
    @collect
end

This is overall based on some pretty simple principles: whenever you apply the call-site lifting operator ., you get null-promoting lifting, no exception. And then functions can add custom methods that work with missing values, if they can come up with a sensible semantic for that. For example, all the comparison operators have methods that work with DataValue arguments that return Bool.

@nalimilan
Copy link
Member

Query.jl right now has zero special casing for == (or any other function one might use in a @select or any other statement), it just relies entirely on my DataValue implementation, so there is no special lifting behavior inside a query that would differ in any way from operating on two scalar DataValues outside of a query. I feel that preserving that kind of behavior is really important for the overall consistency of Query.jl, so we would have to find a solution that doesn't require Query.jl to replace operators inside queries.

Yeah, I meant that replacing Union{T, Null} with DataValue{T} would be equivalent to replacing Nullable{T} with DataValue{T} as you do now. Not ideal, but it works and it is consistent.

There is a proposal on the table that would solve this: JuliaLang/julia#20502.

By using ..==, which isn't exactly the nicest operator and wasn't greeted with a lot of enthusiasm AFAICT.

I'm not strongly opposed to supporting .==, but I don't find it super logical: why would broadcast be related to returning Union{Null, Bool}, since missingness is no longer represented via a container? Also it's quite a subtle difference in the example you showed. Maybe we could use a macro, like @null(i.a == 4)? Or just a custom function. The most logical operator for this would be ?==, but that's probably not worth it.

Regarding < and >, I guess we could follow the behavior of NaN and always return false in the presence of a null. Maybe in practice that's what how we would expect these operators to behave.

@davidanthoff
Copy link
Author

Yeah, I meant that replacing Union{T, Null} with DataValue{T} would be equivalent to replacing Nullable{T} with DataValue{T} as you do now. Not ideal, but it works and it is consistent.

Ah, ok. It would be really nice if I could get rid of that conversion, though... I kind of feel we should shoot for a general way to represent missing values that works for both the array of missing values case and Query.jl (without my own custom type like DataValue).

By using ..==, which isn't exactly the nicest operator and wasn't greeted with a lot of enthusiasm AFAICT.

I'm not strongly opposed to supporting .==, but I don't find it super logical: why would broadcast be related to returning Union{Null, Bool}, since missingness is no longer represented via a container? Also it's quite a subtle difference in the example you showed. Maybe we could use a macro, like @null(i.a == 4)? Or just a custom function. The most logical operator for this would be ?==, but that's probably not worth it.

Yeah, I'm not sure this makes sense if we go with the Union approach. I guess I'm still not entirely convinced that the container approach is not easier overall and this was more in the spirit of a solution if we stick with the container approach.

If all the comparions operators returned Bool always, things would work for Query.jl. One could always resort to isnull(i.a) ? null : i.a==4 if one wants the null propagating behavior. My position on this is pretty simple: I really need the versions that return Bool, but I feel a lot less strong about versions that return Nullable{Bool}.

Regarding < and >, I guess we could follow the behavior of NaN and always return false in the presence of a null. Maybe in practice that's what how we would expect these operators to behave.

Yes, that is the C# behavior. I don't have that in DataValues.jl right now, but maybe I should...

@ararslan
Copy link
Member

ararslan commented May 15, 2017

I'm pretty strongly opposed to continuing to use map and broadcast to perform any kind of special operations on null values. They should apply functions to arrays elementwise, that's it.

@nalimilan
Copy link
Member

If all the comparions operators returned Bool always, things would work for Query.jl. One could always resort to isnull(i.a) ? null : i.a==4 if one wants the null propagating behavior. My position on this is pretty simple: I really need the versions that return Bool, but I feel a lot less strong about versions that return Nullable{Bool}.

Regarding < and >, I guess we could follow the behavior of NaN and always return false in the presence of a null. Maybe in practice that's what how we would expect these operators to behave.

Yes, that is the C# behavior. I don't have that in DataValues.jl right now, but maybe I should...

Let's change < and > to return false in the presence of nulls then.

@nalimilan
Copy link
Member

Closing since the definitions have been changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants