Skip to content

WebAssembly: Suboptimal "promotion" of wasm_f32x4_convert_i32x4 into f32x4.convert_i32x4_u #149457

@zeux

Description

@zeux

When wasm_f32x4_convert_i32x4 intrinsic gets its input from an instruction that clears top bits, the conversion gets compiled into i32x4_u instead of i32x4_s variant; for example:

v128_t plsno(v128_t x)
{
    // u32x4 here changes the convert instruction; it's a problem because u32->f32 is way slower on pre-AVX512 HW
    x = wasm_u32x4_shr(x, 1);
    return wasm_f32x4_convert_i32x4(x);
}

With -msimd128 -O2 compiles into

        local.get       0
        i32.const       1
        i32x4.shr_u
        f32x4.convert_i32x4_u
        end_function

This is a problem because on x64 hardware, convert_i32x4_u gets lowered into a long multi instruction sequence unless the browser implements AVX512 code path and the hardware supports it. Thus this needlessly slows down efficient SIMD kernels.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions