Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing order functions where missings is the smallest value #144

Merged
merged 16 commits into from
Apr 6, 2024

Conversation

alonsoC1s
Copy link
Contributor

As discussed in issue #142, this PR adds a variant of the standard isless function where the smallest possible value is missing. Furthermore, this PR adds a partial order function missingsmallest that takes a partial order function f and modifies the order such that missing is always less than the other argument.

This PR aims to provide support to resolve issue JuliaData/DataFrames.jl#2267 at DataFrames.jl

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Jul 19, 2023

I will ask users on slack for opinion about naming.

@jariji
Copy link

jariji commented Jul 19, 2023

missingsleast would be more consistent with the existing "less" convention. Perhaps that usage ("missing is the least of all the values") is uncommon/unfamiliar, however.

@bkamins
Copy link
Member

bkamins commented Jul 20, 2023

A good comment by Alexander Plavin on Slack:

julia> sort([1, missing, 3, 2], by=x->(!ismissing(x), x))
4-element Vector{Union{Missing, Int64}}:
  missing
 1
 2
 3

Maybe it is easy enough (and just should be documented)

The only downside is that in interactive use it triggers recompilation each time.

@bkamins bkamins requested a review from nalimilan July 22, 2023 21:36
@nalimilan
Copy link
Member

Maybe it is easy enough (and just should be documented)

IMO it's too hard to discover and remember. At least missingsless seems useful to simplify the syntax.

Let's continue the discussion regarding names at JuliaData/DataFrames.jl#2267 to avoid spreading it over several issues, and discuss the implementation here?

README.md Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@alonsoC1s
Copy link
Contributor Author

While thinking of new test sets I came across something interesting. The current implementation only works for comparisons of the same "semantics" as isless. i.e comparisons where the first argument is tested as smallest, which is not the case for other comparison functions. For instance, Base.isgreater. Currently, we would expect Inf > missing to be true always. But since we rely on the order of the arguments, and missing is the second argument, this returns false by default.

I think this is either okay if we only expect users to compare using comparison functions that behave like isless (for instance, the "is much smaller" function) or not okay if we want this to work for any arbitrary function of two arguments that returns bool.

@nalimilan
Copy link
Member

While thinking of new test sets I came across something interesting. The current implementation only works for comparisons of the same "semantics" as isless. i.e comparisons where the first argument is tested as smallest, which is not the case for other comparison functions. For instance, Base.isgreater. Currently, we would expect Inf > missing to be true always. But since we rely on the order of the arguments, and missing is the second argument, this returns false by default.

Could you show examples? I'm a bit confused about what you mean.

@alonsoC1s
Copy link
Contributor Author

Sorry, I think I could have explained it a bit better. What I mean is, the fact that we rely on the order of the arguments to place missing as the smallest value limits the usage of missingsmallest to ordering functions that are equivalent to < (or isless) and excludes functions equivalent to > (or Base.isgreater)

More concretely.

  • missing < x should be true for any x.
  • By the same argument missing > x should be false for any x

But since the current implementation always returns true if the first argument is missing, missing > x would return true; there is no way to modify the order > such that missing is the smallest possible value.

As I said I am not sure if this could be a problem. In the concrete example of Base.isgreater, missing is already the smallest possible value as mentioned in their docstring. So this might not require a fix

@bkamins
Copy link
Member

bkamins commented Aug 1, 2023

Your comment is good. Maybe we do not need to add anything but instead recommend using the following:

julia> sort([1, missing, 3, 2])
4-element Vector{Union{Missing, Int64}}:
 1
 2
 3
  missing

julia> sort([1, missing, 3, 2], rev=true)
4-element Vector{Union{Missing, Int64}}:
  missing
 3
 2
 1

julia> sort([1, missing, 3, 2], lt=Base.isgreater)
4-element Vector{Union{Missing, Int64}}:
 3
 2
 1
  missing

julia> sort([1, missing, 3, 2], lt=Base.isgreater, rev=true)
4-element Vector{Union{Missing, Int64}}:
  missing
 1
 2
 3

and this is enough. Then the only question is why Base.isgreater is not exported by Base. Maybe we could export it in Missings.jl with the guarantee that if Base.isgreater is removed from Base we would provide an implementation.

Base.isgreater was added in JuliaLang/julia#35316. @StefanKarpinski - you merged this change. So maybe you can comment on why Base.isgreater is not exported? Thank you!

@alonsoC1s
Copy link
Contributor Author

I think that would work. My only concern would be, what if the user wants to use something that behaves like isgreater(x, y) but not exactly that. For instance, the ismuchgreater function that returns true if x > 100 * y. That case can't be solved by using missingsmallest, nor Base.isgreater.

@bkamins
Copy link
Member

bkamins commented Sep 4, 2023

@alonsoC1s - so we have a conclusion regarding the API in JuliaData/DataFrames.jl#2267.
Do you have time to finalize this PR?

@alonsoC1s
Copy link
Contributor Author

@bkamins sure, I'll work on this on the weekend so we can hopefully have it resolved by next week

@alonsoC1s
Copy link
Contributor Author

@bkamins I implemented the version suggested by @LilithHafner without the the changes that would need to be upstreamed to Base

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member

bkamins commented Oct 9, 2023

@alonsoC1s - is it clear what should be corrected?

@alonsoC1s
Copy link
Contributor Author

I believe so. I'll get to work on the latests comments :)

@LilithHafner
Copy link

Thanks!

@LilithHafner
Copy link

Nightly failures are unrelated.

@pdeffebach
Copy link
Contributor

Should we also add missingslargest ? I know thats the default behavior but it's nice to have obvious syntax to describe what's going on.

@LilithHafner
Copy link

My inclination is to make the documentation of missingsmallest explicit about isless serving that role rather than adding missinglargest because I'm worried that the existence of missinglargest could be confusing, but I could see it going either way.

@LilithHafner
Copy link

@bkamins, should this be merged?

@bkamins
Copy link
Member

bkamins commented Nov 8, 2023

I approved it already some time ago. We are waiting for @nalimilan before merging.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I've found a pair of remaining issues in doctests (as unfortunately we don't test them since we don't build a manual).

src/Missings.jl Outdated Show resolved Hide resolved
src/Missings.jl Show resolved Hide resolved
src/Missings.jl Outdated Show resolved Hide resolved
@pdeffebach
Copy link
Contributor

@LilithHafner Bumping this. The final changes seem pretty small before merging.

@LilithHafner
Copy link

Sorry about the delay. I recommend accepting @nalimilan's suggestions and then then merging into main if CI still passes.

I don't have commit bit here. cc @bkamins, @nalimilan.

@pdeffebach
Copy link
Contributor

Ah sorry! I thought you were the one making the changes! @alonsoC1s can you make the final changes?

@bkamins
Copy link
Member

bkamins commented Mar 3, 2024

@alonsoC1s - when you finalize the PR can you also update the tests so that they pass both on the current release and on nightly? (there is an issue I think that depending on Julia version a different exception type is thrown in some cases). Thank you!

Co-authored-by: Milan Bouchet-Valat <[email protected]>
@bkamins
Copy link
Member

bkamins commented Mar 4, 2024

I have commited the suggestions.

@alonsoC1s
Copy link
Contributor Author

@bkamins sure, I'll be happy to. This week is a bit busy, but I will wrap everything up as soon as I can get some free time

@alonsoC1s
Copy link
Contributor Author

@bkamins I'm taking a look at the reasons for failure on different Julia versions and I think you are right, different versions throw different Errors. What is the best practice for dealing with this version-specific behaviour?

@bkamins
Copy link
Member

bkamins commented Mar 27, 2024

I typically test for an abstract error that is a supertype of the errors thrown.
Alternative is to write version-specific tests, but this is more cumbersome.

@alonsoC1s
Copy link
Contributor Author

Agreed, I'll test for more general Exceptions

@bkamins
Copy link
Member

bkamins commented Apr 4, 2024

@nalimilan - OK to merge?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/Missings.jl Outdated Show resolved Hide resolved
@bkamins bkamins merged commit cdeb5a7 into JuliaData:main Apr 6, 2024
6 checks passed
@bkamins
Copy link
Member

bkamins commented Apr 6, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add isless variant where missing is smallest
6 participants