Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing argmax and argmin behavior to match the mathematical definition #27639

Closed
Nosferican opened this issue Jun 18, 2018 · 25 comments
Closed

Comments

@Nosferican
Copy link
Contributor

The definition of argmax from Wikipedia,
Given a collection of elements (domain), which map to an ordered set (range), what is the subset containing those elements in the domain whose image is the frontier.

Currently, argmax is actually indmax which is defined for indexed collections and returns the index of the first encountered element whose image is equal to the frontier. The generalization of keymax (discussed, but not implemented) extends the supported collections from indexed to pairs-implemented collections. This is the result of renaming indmax to argmax. A current proposal is for argmax to be defined for Generator (#27613),

argmax(f(x) for x ∈ domain)

with that change, we could fix the implementation with something like

import Base: Generator, argmax
function argmax(f::Function, domain::Generator)
    T = eltype(domain)
    tested = Set{T}()
    output = Set{T}()
    frontier = nothing
    for elem ∈ domain
        if !(elem ∈ tested)
            push!(tested, elem)
            value = f(elem)
            if (frontier === nothing) || (value > frontier)
                frontier = value
                output = Set(elem)
            elseif value == frontier
                push!(output, elem)
            end
        end
    end
    return output
end

and call it with the following syntax

argmax(abs2, elem for elem ∈ -3:3) == Set([-3, 3])
@JeffBezanson
Copy link
Sponsor Member

That definition and implementation don't depend on Generator; any iterable would work.

I agree this is useful but I don't think argmax should return a Set by default; it's quite inefficient in the common case where the answer is e.g. a single integer. This could be a new function like argmaxall or argmaxes.

@Nosferican
Copy link
Contributor Author

The answer would have form of some collection of eltype(domain). Maybe if the length of the container is one, it could return the value. I used Set in the implementation to guarantee uniqueness (no need to compute the value for an element which we already considered, but overkill since I am checking for it in the loop). Other ways could call unique on the domain as first step (but why do an extra pass?). There shouldn't be any efficiency loses from returning the full set rather than the first since the implementation only passes through each once (no ability to break early since it doesn't do two passes). output could be set to Vector{eltype(domain)} if that is more convenient (bit extra information of order which shouldn't matter, but why not). Since output is guarantee to be non-empty, people could call first on it if it is a <:AbstractVector. Any iterable should work.

@Nosferican
Copy link
Contributor Author

We should probably resurrect indmax and implement

argmax(f::Function, itr::Any, unique::Bool=false)

separately.

@nalimilan
Copy link
Member

See also #25654 and #24865.

@StefanKarpinski
Copy link
Sponsor Member

What is the unique flag meant to do?

@Nosferican
Copy link
Contributor Author

Nosferican commented Jun 22, 2018

The mathematical definition defines the solution as a set, but for computational reasons it might be best to provide a Vector{eltype(domain)}. In certain applications, the solution is guaranteed to be a single element which would return a Vector{eltype(domain)} of length one. For convenience we could offer the unique flag to perform a first(output) to return a eltype(domain). I am not married to it, as users could just call first on it directly (also an advantage of using <:AbstractVector over <:Dict).

@nalimilan, thanks for linking those issues (I had hunting for those in my task list for today). I would like to know your thoughts on the issue. I concur indmax was not the best name and changing it would be beneficial, but I believe argmax was not the right name as it represents a different concept/operation. My ideal solution would be to offer the current behavior of argmax renamed with something better and use argmax for the mathematical concept described here (that would also be a great to expedite and simplify the work on a PR to implement it).

@nalimilan
Copy link
Member

Well, if we wanted to be fully consistent argmax would have to be renamed to findfirstmax/findmaxfirst, and then the function returning the indices of all values giving the maximum would be findallmax/findmaxall. In practice these names aren't too appealing and argmax is a well-known concept, so I'd rather keep the current behavior of argmax, but (as @JeffBezanson said) provide another function or an argument to get a vector of indices rather than just the first index.

Something I would really have liked to change is renaming findmax to something else for consistency with other find* functions (#24865).

@Nosferican
Copy link
Contributor Author

Nosferican commented Jun 22, 2018

On the decision of returning the indices or subset of domain,
R uses which.min and which.max with a keyword for first or every index.
Python uses values.index(max(values)) similar to findall(isequal(maximum(obj)), obj)
Matlab uses max with the 2-nargout option (similar to last(findmax(obj))

The only implementation of this functionality using argmax as the name is numpy.

My number one concern is to avoid having a misnamed concept similarly to the numpy case. I believe it would be nice to have an efficient and convenient function for the argmax concept and one for the indices (especially one which doesn't require materializing the whole image), but not to have the one returning the indices named argmax.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 22, 2018

The mathematical meaning of argmax(f, itr) is clear: find an element of itr which maximizes f. Having argmax(v) return indices seems wrong if you think of f = identity and v = itr. However, it makes perfect sense if you think of f as v—or more precisely f = i -> v[i] and itr = keys(v). As long as we explain that, it does not seem like there's anything wrong with using the name argmax for that quite useful operation. Moreover, the alternative possible meaning of argmax(v) as argmax(identity, v) is silly because maximum(v) already does that.

The only question in my mind is whether it would be better to leave argmax(v) undefined and have a separate indmax(f, v) function which returns an index in v at which the function f is maximized and then have indmax(v) = indmax(identity, v) which is what our argmax currently does (not coincidentally, it was previously called indmax). But I can't see much need for indmax(f, v) since we can just write argmax(i->f(v[i]), keys(v)) or if some version of #24990 gets merged, argmax(f∘v[_], keys(v)) which doesn't seem so bad in cases that you want indices instead of values.

Bottom line: if you're interested in indices, you can always use the indices as your collection and make indexing part of the function you're maximizing. You can kind of go in the other direction since you can always use an index to extract an element back out of an indexed collection, but if we're choosing between indmax and argmax, argmax wins since it makes sense on any iterable, not just indexable collections. So having argmax without indmax is fine but having indmax without argmax is not—you still have no way of doing what argmax(f, itr) does in general. Of course, we don't currently have that method, but we should.

@Nosferican
Copy link
Contributor Author

Good points. The edge cases seem to suggest the best design would be a version of
argmax(f::Function, itr::Any; first::Bool = false) as it seems the most versatile and useful.

@StefanKarpinski
Copy link
Sponsor Member

Is that first keyword intended to control whether to get only the first maximizing value or all of the? If so, I'd make a few comments: I do not think that returning all maximizing elements is more standard or more useful so it should not be the default. Sure, the formal definition of argmax is that it returns a set but if you look at typical usages in math writing this is usually glossed over and it is assumed that argmax gives you a single maximizing value.

@Nosferican
Copy link
Contributor Author

I am perfectly happy with having that option default to a single value (i.e., first::Bool = true), especially if argmax will be used for querying indices/keys. Another option would be to default to return a Vector{eltype(domain)} and then people can call first(argmax(f, domain)) as it will always be a non-empty <:AbstractVector. Either solution seems good.

@StefanKarpinski
Copy link
Sponsor Member

Either solution seems good.

Doing first(argmax(f, domain)) is wildly inefficient.

@Nosferican
Copy link
Contributor Author

Nosferican commented Jun 22, 2018

Using the implementation at the start of this of this post it wouldn't be costly at all. However, I now think having the keyword argument would be best since the implementation could be adapted to only update upon finding a new frontier as opposed to expanding the solution set when new images are at the frontier (i.e., the efficiency gains would come from knowing the desired output at the start). With a bit of guidance I could prep a PR for these changes after deciding on the specifics.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 22, 2018

Using the implementation at the start of this of this post it wouldn't be costly at all.

I think we have different notions of what "costly" means. Your implementation allocates two Set objects (why two? only one seems necessary). The scalar version of argmax(f, itr) shouldn't allocate anything at all.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 22, 2018

Here's a sketch implementation of the two-arg argmax(f, itr):

function argmax(f, itr)
    r = iterate(itr)
    r === nothing && error("empty collection")
    m, state = r
    f_m = f(m)
    while true
        r = iterate(itr, state)
        r === nothing && break
        x, state = r
        f_x = f(x)
        isless(f_m, f_x) || continue
        m, f_m = x, f_x
    end
    return m
end

The iterable itr is iterated only once and f is applied to the yielded values exactly once per item. It allocates nothing unless f or the iteration of itr does.

@Nosferican
Copy link
Contributor Author

The reason for the two collections are different: (1) keeps track of whether the image is known (it skips the computation of f on elements it has tested which is best suited for costly mapping functions or a high number of duplicated elements in the domain) and (2) it would only allocate it when doing the full subset, for the unique solution one it wouldn't allocate a collection and just keep track of the frontier and best guess.
I originally used Set, but since I keep track of duplicated it can just be a Vector.

@StefanKarpinski
Copy link
Sponsor Member

Set versus Vector doesn't much matter, they're both big, expensive data structures.

@Nosferican
Copy link
Contributor Author

Nosferican commented Jun 22, 2018

We could just ask the users to pass argmax(f, unique(domain)), if it is a costly mapping or many duplicates, no?

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 22, 2018

unique is an expensive allocating operation itself, so that doesn't help. Perhaps you're missing something about what I'm saying: I'm not worried about the number of elements in in the collection that's returned, my problem is with returning a collection at all. Especially given that the scalar version of this that people generally actually want can be implemented (as above), in a completely efficient generic way that does no allocation whatsoever. For example, if I have a definition like this:

g(a, b) = argmax(x -> -abs(x-3), min(a,b):max(a,b))

then the native code for a call like g(-10, 10) is really efficient:

	cmpq	%rdi, %rsi
	movq	%rdi, %r8
	cmovleq	%rsi, %r8
	cmovlq	%rdi, %rsi
	cmpq	%rsi, %r8
	jne	L23
	movq	%r8, %rax
	retq
L23:
	leaq	-3(%r8), %rax
	movl	$3, %ecx
	subq	%r8, %rcx
	testq	%rax, %rax
	cmovnsq	%rax, %rcx
	negq	%rcx
	movq	%r8, %rax
L48:
	movl	$2, %edi
	subq	%r8, %rdi
	leaq	-2(%r8), %r9
	leaq	1(%r8), %rdx
	testq	%r9, %r9
	cmovnsq	%r9, %rdi
	negq	%rdi
	cmpq	%rdi, %rcx
	cmovlq	%rdx, %rax
	cmovlq	%rdi, %rcx
	movq	%rdx, %r8
	cmpq	%rsi, %rdx
	jne	L48
	retq

It's all just simple integer instructions, comparisons and jumps. If there was any allocation whatsoever in the definition of argmax then this would be totally different and not so incredibly simple and fast.

@StefanKarpinski
Copy link
Sponsor Member

I think we can close this after discussion at #27612 (comment) which leaves us in the situation that the only actionable change is adding the two-argument argmax method, for which there's already a specific issue open: #27613.

@nalimilan
Copy link
Member

The mathematical meaning of argmax(f, itr) is clear: find an element of itr which maximizes f. Having argmax(v) return indices seems wrong if you think of f = identity and v = itr. However, it makes perfect sense if you think of f as v—or more precisely f = i -> v[i] and itr = keys(v). As long as we explain that, it does not seem like there's anything wrong with using the name argmax for that quite useful operation. Moreover, the alternative possible meaning of argmax(v) as argmax(identity, v) is silly because maximum(v) already does that.

Technically there's no ambiguity, but am I the only one to find it weird that argmax(identity, x) would be completely different from argmax(x)? None of our reduction functions does that AFAICT.

I'm starting to regret having renamed indmax to argmax now that we want to add that two-argument method with an inconsistent meaning...

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jun 27, 2018

It's a little unexpected but having argmax(v) mean argmax(i -> v[i], keys(v)) seems to be intuitive to many people as well. As I argued above, indmax isn't really necessary as a function at all.

@martinholters
Copy link
Member

I, too, find it weird. Also, there is no f such that argmax(f, x) == argmax(x), i.e. f cannot be seen as an optional argument in non-final position here. Doesn't mean it's no-go, but it's unusual and probably would give me a pause if I came across it without being aware of the background in this issue and related PRs.

@davidanthoff
Copy link
Contributor

I concur with @nalimilan here. I find argmax(v) as a shortcut for argmax(i -> v[i], keys(v)) really weird, it seems a pattern that I haven't seen anywhere else in base. On the flipside, the pattern foo(v)==foo(identity, v) seems really common, so that is certainly what I would have expected.

I think fundamentally things would be easier if argmax always returned an element out of the original source, and never an index. Getting and index seems kind of surprising to me, and as @nalimilan pointed out also not something often seen in other languages.

A separate issue seems to be whether it should return a single value, a list or a set. On that I concur with @StefanKarpinski: a single element seems the most common and useful case.

cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Nov 21, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Dec 17, 2020
StefanKarpinski pushed a commit that referenced this issue Dec 18, 2020
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes #27613. Related: #27639, #27612, #34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes JuliaLang#27613. Related: JuliaLang#27639, JuliaLang#27612, JuliaLang#34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants