Optimize float `min`/`max` with constant propagation

For any `y` except `-0.0` or `NaN`, it holds that `min(x,y) === min(y,x) === ifelse(y<x,y,x)`.
For any `y` except `+0.0` or `NaN`, it holds that `min(x,y) === min(y,x) === ifelse(y<=x,y,x)`.
For any `y` except `-0.0` or `NaN`, it holds that `max(x,y) === max(y,x) === ifelse(y>=x,y,x)`.
For any `y` except `+0.0` or `NaN`, it holds that `max(x,y) === max(y,x) === ifelse(y>x,y,x)`.
~~For `isnan(y)` it holds that `isequal(min(x,y),y)` and `isequal(max(x,y),y)`. Note the lack of guaranteed `===` (except when `x` is not-NaN) so this may not be a desirable substitution.~~ see next line for better
For `x` and/or `y`  is `NaN`, it holds that `min(x,y) === max(x,y) === x-y` (may change with implementation of `min`/`max`).

In other words, when any `min`/`max` operation involves a known value (or a value known *not* be be one of a small set of values), it is possible to use the ternary definitions of these functions (or subtraction if known `NaN`). On x86, the ternary versions require just 1 instruction (`<` or `>`) or 2 instructions (`<=` or `>=`) while the general versions require roughly 8 (see #41709). As of #47814, aarch64 does not suffer this drawback as it has native support for our desired `min`/`max` semantics.

This means that common operations such as `max(x,c)` can be made much faster (on x86) than they currently are when `c` is known (for example, a compile-time constant). However, on aarch64 we might want to avoid this substitution as it might not be better than the natively supported general version.

One can make these substitutions manually, but it'd be pretty slick if the compiler could do this automatically. I don't know what it takes to impart this sort of optimization. This might be LLVM territory? I see this as similar to `sqrt(abs(x))` eliding the check-and-error for negative arguments to `sqrt`, but that similarity may be superficial.

For example, on an x86 compare the following equivalent functions:
```
julia> code_native(x->max(1.0,x),Tuple{Float64};debuginfo=:none)
# shortened for brevity
        movabs  rax, offset .LCPI0_0
        vmovsd  xmm2, qword ptr [rax]           # xmm2 = mem[0],zero
        vsubsd  xmm1, xmm2, xmm0
        vmovq   rax, xmm1
        test    rax, rax
        jns     .LBB0_2
# %bb.1:                                # %top
        vmovapd xmm2, xmm0
.LBB0_2:                                # %top
        vcmpordsd       xmm0, xmm0, xmm0
        vblendvpd       xmm0, xmm1, xmm2, xmm0
        ret

julia> code_native(x->ifelse(1.0>x,1.0,x),Tuple{Float64};debuginfo=:none)
# shortened for brevity
        movabs  rax, offset .LCPI0_0
        vmovsd  xmm1, qword ptr [rax]           # xmm1 = mem[0],zero
        vmaxsd  xmm0, xmm1, xmm0
        ret
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize float `min`/`max` with constant propagation #48487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Optimize float min/max with constant propagation #48487

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Optimize float `min`/`max` with constant propagation #48487