Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indmin/indmax #41339

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

Seelengrab
Copy link
Contributor

Adds indmin and indmax methods which return the index into a collection, minimizing/maximizing a given function (or identity if none is given).

This came out of the discussion surrounding #39203.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jun 23, 2021

Why git doesn't strip trailing whitespace by default escapes me. Should be final now.

@Seelengrab
Copy link
Contributor Author

Uh.. CI failures are not originating in the code of this PR..? 😬

@simeonschaub
Copy link
Member

It is related:

LoadError("sysimg.jl", 19, LoadError("/usr/home/julia/worker/package_freebsd64/build/usr/share/julia/stdlib/v1.8/Dates/src/Dates.jl", 3, LoadError("/usr/home/julia/worker/package_freebsd64/build/usr/share/julia/stdlib/v1.8/Dates/src/io.jl", 448, MethodError(argmin, ((1,),), 0x0000000000005bc9))))

Likely because the Dates stdlib uses argmin.

@Seelengrab
Copy link
Contributor Author

Uh - I didn't touch argmin though 🤔 If Dates uses argmin, that should be failing before this PR as well.

@Seelengrab
Copy link
Contributor Author

Oooh, I see what's happening!

@Seelengrab Seelengrab force-pushed the add_indmin_indmax branch 3 times, most recently from 6b4ab5b to 331c612 Compare June 23, 2021 20:09
@Seelengrab
Copy link
Contributor Author

Ok, there we go - I really shouldn't write PRs without eating dinner..

@JeffBezanson
Copy link
Member

For reference: #25654
There may be no hope of consensus here...

@Seelengrab
Copy link
Contributor Author

The funny thing is (as far as I understand), argmin/max(A) == indmin/max(A) and since argmin(f, A) is much rarer to encounter in other languages (at least I never heard of other languages having that), generalising indmax only makes sense if you have the two arg version as a different thing. After thinking everything through, I prefer having that "duplicated" functionality since they're conceptually different and having a different function name serves as some kind of inline documentation about what was intended.

So I've arrived at "let's keep both", which I haven't seen written out from other people yet 🤷‍♂️

@Seelengrab
Copy link
Contributor Author

Though it's probably best in that case if we make the distinction between argmin and indmin very clear in the single argument case of argmin/max.

@goretkin
Copy link
Contributor

(I am aware that there's been a lot of noise around this. Sorry...)

"key" and "index" are used a bit interchangeably. a[i] lowers to getindex, but keys works on arrays and dicts. I am tempted to suggest keymin/keymax. Furthermore, I wonder about the choice of defining it as

keymax(itr) = argmax(k -> a[k], keys(itr))
keymax(f, itr) = argmax(k -> f(a[k]), keys(itr))

instead of

keymax(itr) = findmax(itr)[2]
keymax(f, itr) = findmax(f, itr)[2]

I do think when all you want is the key/index, that findmax(itr)[2] is an underwhelming spelling, because it requires remembering the position of the value/key. But I wonder about the benefit of introducing new method name for what can already be written using findmax/ findmin.

Comparison to wrong value, misplaced findmin instead of findmax
@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jun 23, 2021

There slipped another one through - next failure will be fixed tomorrow 😭

findmin and findmax already call pairs and handle all that with their mapfoldl, as far as I understand - to me, giving those to argmax and specifying the anonymous function manually only makes the definition of indmax more complicated.

I personally prefer indmax since it's getindex, not getkey. That's it for me as far as bikeshedding goes though, it's mostly more convenient for me because I tend to use arrays more often than other indexables.

@JeffBezanson
Copy link
Member

JeffBezanson commented Jun 23, 2021

This might have been proposed before but maybe findmax should return a named tuple? Then you could do findmax(a).index (or key)

@Seelengrab
Copy link
Contributor Author

That makes pipeability harder though and broadcasting nigh impossible.

@Seelengrab
Copy link
Contributor Author

Error in testset LibSSH2_jll:

I think it's safe to say that the failure truly is unrelated now.

@goretkin
Copy link
Contributor

That makes pipeability harder though and broadcasting nigh impossible.

You mean like getfield.(findmax.(itr_of_itr), :index) ? Something like #24990 would improve the situation.

@Seelengrab
Copy link
Contributor Author

Seelengrab commented Jun 24, 2021

Right - or just indmax.(itr_of_itr) :)

As for piping:

itr_of_itr .|> indmax

vs.

itr_of_itr .|> x -> findmax(x).index

and

itr |> indmax

vs.

itr |> x -> findmax(x).index

or

itr |> findmax |> x -> getindex(x, :index)

In either case, I prefer the short version. Though the _ anonymous function PR would help with indmax(non_identity, _) as well.

@nalimilan
Copy link
Member

For reference: #25654
There may be no hope of consensus here...

To develop a bit, the root of the issue IMO is that findmin and findmax would be more appropriate names for these operations (consistent with findfirst/findlast/findnext/findprev/etc.), but these names are already used for something slightly different. So when trying to make search functions consistent at #24865 we opted for the simpler solution of calling this argmin and argmax. And now we're trapped in this situation...

@nalimilan nalimilan added the search & find The find* family of functions label Jun 24, 2021
@Seelengrab
Copy link
Contributor Author

I don't think we're necessarily trapped - I think just adding indmin and indmax as aliases for what people could already do would help alleviate the confusion about which to use when looking for indices of things.

@jariji
Copy link
Contributor

jariji commented May 30, 2023

I don't know if Julia has a canonical abbreviation of "index" but in Pandas the name is idxmax rather than indmax.

@Seelengrab
Copy link
Contributor Author

So when trying to make search functions consistent at #24865 we opted for the simpler solution of calling this argmin and argmax. And now we're trapped in this situation...

I really don't think we're trapped. Merging this and deprecating single-arg argmin/argmax in favor of idxmax/argmax ought to be perfectly viable.

I don't know if Julia has a canonical abbreviation of "index" but in Pandas the name is idxmax rather than indmax.

I don't mind changing this to idxmax/idxmin 🤷

@jariji
Copy link
Contributor

jariji commented May 31, 2023

I normally don't like abbreviations in APIs at all since then users have to remember which abbreviation it is. In that case indexmax or keymax would be preferred.

@nalimilan
Copy link
Member

I really don't think we're trapped. Merging this and deprecating single-arg argmin/argmax in favor of idxmax/argmax ought to be perfectly viable.

Sure. don't really have an opinion about that. The trap I referred to is that findmin and findmax should return indices if we want to be consistent with other find* functions -- so that these would be the same as the proposed indmin/indmax. But even if we deprecate findmin and findmax now, we can't change their behavior before 2.0, and even then it would be potentially confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search & find The find* family of functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants