Skip to content

Optimize float min/max with constant propagation #48487

@mikmoore

Description

@mikmoore

For any y except -0.0 or NaN, it holds that min(x,y) === min(y,x) === ifelse(y<x,y,x).
For any y except +0.0 or NaN, it holds that min(x,y) === min(y,x) === ifelse(y<=x,y,x).
For any y except -0.0 or NaN, it holds that max(x,y) === max(y,x) === ifelse(y>=x,y,x).
For any y except +0.0 or NaN, it holds that max(x,y) === max(y,x) === ifelse(y>x,y,x).
For isnan(y) it holds that isequal(min(x,y),y) and isequal(max(x,y),y). Note the lack of guaranteed === (except when x is not-NaN) so this may not be a desirable substitution. see next line for better
For x and/or y is NaN, it holds that min(x,y) === max(x,y) === x-y (may change with implementation of min/max).

In other words, when any min/max operation involves a known value (or a value known not be be one of a small set of values), it is possible to use the ternary definitions of these functions (or subtraction if known NaN). On x86, the ternary versions require just 1 instruction (< or >) or 2 instructions (<= or >=) while the general versions require roughly 8 (see #41709). As of #47814, aarch64 does not suffer this drawback as it has native support for our desired min/max semantics.

This means that common operations such as max(x,c) can be made much faster (on x86) than they currently are when c is known (for example, a compile-time constant). However, on aarch64 we might want to avoid this substitution as it might not be better than the natively supported general version.

One can make these substitutions manually, but it'd be pretty slick if the compiler could do this automatically. I don't know what it takes to impart this sort of optimization. This might be LLVM territory? I see this as similar to sqrt(abs(x)) eliding the check-and-error for negative arguments to sqrt, but that similarity may be superficial.

For example, on an x86 compare the following equivalent functions:

julia> code_native(x->max(1.0,x),Tuple{Float64};debuginfo=:none)
# shortened for brevity
        movabs  rax, offset .LCPI0_0
        vmovsd  xmm2, qword ptr [rax]           # xmm2 = mem[0],zero
        vsubsd  xmm1, xmm2, xmm0
        vmovq   rax, xmm1
        test    rax, rax
        jns     .LBB0_2
# %bb.1:                                # %top
        vmovapd xmm2, xmm0
.LBB0_2:                                # %top
        vcmpordsd       xmm0, xmm0, xmm0
        vblendvpd       xmm0, xmm1, xmm2, xmm0
        ret

julia> code_native(x->ifelse(1.0>x,1.0,x),Tuple{Float64};debuginfo=:none)
# shortened for brevity
        movabs  rax, offset .LCPI0_0
        vmovsd  xmm1, qword ptr [rax]           # xmm1 = mem[0],zero
        vmaxsd  xmm0, xmm1, xmm0
        ret

Metadata

Metadata

Assignees

No one assigned

    Labels

    compiler:llvmFor issues that relate to LLVMmathsMathematical functionsperformanceMust go fasterupstreamThe issue is with an upstream dependency, e.g. LLVM

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions