Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename findmin and findmax? #24865

Closed
nalimilan opened this issue Nov 30, 2017 · 19 comments
Closed

Rename findmin and findmax? #24865

nalimilan opened this issue Nov 30, 2017 · 19 comments
Labels
search & find The find* family of functions

Comments

@nalimilan
Copy link
Member

nalimilan commented Nov 30, 2017

findmin and findmax return a tuple with the index of the maximum/minimum, and the corresponding value. This behavior is kind of inconsistent with find, findfirst, findlast, findnext and findprev, which return indices without values. On the other hand, indmin and indmax return only the indices.

It would be more consistent to rename indmin/indmax to findmin/findmax. Of course this is not possible in a single deprecation cycle. I can see two solutions:

  • Deprecate findmin and findmax in 0.7, and rename indmin and indmax to findmin and findmax in 1.0. This has the drawback of breaking things between 0.7 and 1.0.
  • Rename indmin and indmax to findmin and findmax in 0.7. This introduces breaking changes in 0.7, but at least there wouldn't be any changes in 1.0.

Finally, it's not clear to me whether we really need the current behavior of findmin/findmax, compared with getting the index and then extracting the corresponding element manually. The performance gain should only be significant for very small collections AFAICT.

@nalimilan nalimilan added search & find The find* family of functions triage This should be discussed on a triage call labels Nov 30, 2017
@dpsanders
Copy link
Contributor

Isn't this basically what is usually called argmin and argmax (although those are not great names)?

@nalimilan
Copy link
Member Author

Isn't this basically what is usually called argmin and argmax (although those are not great names)?

And indeed: #1655.

@nalimilan
Copy link
Member Author

Isn't this basically what is usually called argmin and argmax (although those are not great names)?

And indeed these were the first proposed names for these functions: #1655.

EDIT: see also discussion at #7327.

@StefanKarpinski
Copy link
Sponsor Member

I still like argmin and argmax personally.

@nalimilan
Copy link
Member Author

I still like argmin and argmax personally.

For the method returning just the index (current indmin and indmax), or for that returning both the index and the value (current findmin and findmax)? What name do you suggest for the other pair of methods?

@dpsanders
Copy link
Contributor

Do those need to be separate methods? Is there a cost to always just returning both the index and the value?

@nalimilan
Copy link
Member Author

As I noted in the OP, there's a "consistency cost" (if that's a thing) since find, findfirst, findnext, findlast and findprev all return a single index. That's not the end of the world, but IMHO it's better to use similar names for functions behaving similarly to make the language more predictable.

@StefanKarpinski
Copy link
Sponsor Member

I think we may have to hammer this one out with @JeffBezanson when he gets back as part of #10593. I hate to break the deadline, but I think this needs his input. It will be easier to work though after we've gotten all the other stuff out of the way.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Dec 14, 2017
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Dec 14, 2017
@JeffBezanson
Copy link
Sponsor Member

I'm ok with renaming indmin to findmin, but I would really like to keep the functionality of returning both a minimum and its index. There are plenty of structures where an extra lookup could be expensive (e.g. a database). We also have findmin! and N-d versions, so the findmin functions actually include quite a bit of functionality we don't want to drop.

Maybe we could call it minandind? That's a bit hard to read but I don't have a better idea yet.

I also think this is low-urgency and doesn't need to be done for 1.0.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Dec 31, 2017
@nalimilan
Copy link
Member Author

FWIW, we have a series of functions with "and" in StatsBase: mean_and_std, mean_and_var and mean_and_cov (not sure about the underscores). Though they are a bit different since they compute two quantities rather than returning the index.

We could also find a different term to indicate the action of returning both the value and the index: fetch, pick, take, select?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 31, 2017

Or minpair, to borrow terminology from the usage of Pairs elsewhere.

@andreasnoack
Copy link
Member

but I would really like to keep the functionality of returning both a minimum and its index.

I'd prefer that we general move in that direction and return the value in the find family of functions. Returning the value together with the index should be really cheap in general.

@nalimilan
Copy link
Member Author

That's a possibility, but that's going to be much more breaking that what I already have as part of #10593. That would also introduce an inconsistency with find (for which returning an array of values would be quite costly), and an inconvenience in the relatively frequent case where you don't need the value (in particular when using the equalto predicate, for which it's kind of ridiculous to return the value).

I think it would be simpler to have the find* return only indices as they do now (except for findmin/findmax), and introduce later new functions which would return both the index and value. We could then reorganize the internal implementation as appropriate if needed.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jan 4, 2018
@JeffBezanson JeffBezanson removed this from the 1.0 milestone Jan 4, 2018
@JeffBezanson
Copy link
Sponsor Member

From triage: does not need to be in 1.0.

@nalimilan
Copy link
Member Author

nalimilan commented Jan 20, 2018

#25654 renames indmin and indmax to argmin and argmax.

@SimonEnsemble
Copy link

Julia user suggestion: please save others time by mentioning in docs for argmin/argmax the ind2sub function.

A = [1 2; 3 5]
indmin(A) # 1
# but I want the rows and columns!
ind2sub(A, indmin(A)) # (1, 1) what I wanted

Something like "For multi-dimensional arrays, the ind2sub() function will convert the index to a tuple."

@JeffBezanson
Copy link
Sponsor Member

argmin now returns a CartesianIndex in this case, so that behavior has basically become the default.

@StefanKarpinski
Copy link
Sponsor Member

Note that ind2sub(dims, ind) has been deprecated to (CartesianIndices(dims))[ind] (which has the advantages that CartesianIndices(dims) is an immutable object which can be created once instead of for each scalar value of ind and that it can be used in indexing expressions).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jun 26, 2019

Seems like we settled on something, given that 1.0 is not accepting deprecations anymore. If someone disagrees, post a comment on the further change needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search & find The find* family of functions
Projects
None yet
Development

No branches or pull requests

7 participants