[API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions #88946

MadProbe · 2023-07-15T11:24:15Z

Background and motivation

There are approved and soon to be added AVX512-VBMI2 Compress & Expand intrinsics as part of new vector mask proposal. There is little reason if at all to not to add the left-over instructions from AVX512-VBMI2 instruction set.
Notice:

LeftLower and RightUpeer versions don't exist and can be replaced by lower << count and upper >> count, which produce the same result and the other operand is unused.

API Proposal

namespace System.Runtime.Intrinsics.X86
{
    [Intrinsic]
    public abstract class Avx512Vbmi2 : Avx512BW
    {
        public static new bool IsSupported { get => IsSupported; }

        [Intrinsic]
        public new abstract class X64 : Avx512BW.X64
        {
            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public new abstract class VL : Avx512BW.VL
        {
            public static new bool IsSupported { get => IsSupported; }

            /// <summary>
            /// __m128i _mm512_shldi_epi16 (__m128i a, __m128i b, int imm8)
            /// VPSHLDW xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftLeftUpper(Vector128<ushort> upper, Vector128<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi16 (__m256i a, __m256i b, int imm8)
            /// VPSHLDW ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftLeftUpper(Vector256<ushort> upper, Vector256<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldi_epi32 (__m128i a, __m128i b, int imm8)
            /// VPSHLDD xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<uint> ConcatenateShiftLeftUpper(Vector128<uint> upper, Vector128<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi32 (__m256i a, __m256i b, int imm8)
            /// VPSHLDD ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<uint> ConcatenateShiftLeftUpper(Vector256<uint> upper, Vector256<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldi_epi64 (__m128i a, __m128i b, int imm8)
            /// VPSHLDQ xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftLeftUpper(Vector128<ulong> upper, Vector128<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi64 (__m256i a, __m256i b, int imm8)
            /// VPSHLDQ ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftLeftUpper(Vector256<ulong> upper, Vector256<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi16 (__m128i a, __m128i b, int imm8)
            /// VPSHRDW xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftRightLower(Vector128<ushort> upper, Vector128<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi16 (__m256i a, __m256i b, int imm8)
            /// VPSHRDW ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftRightLower(Vector256<ushort> upper, Vector256<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi32 (__m128i a, __m128i b, int imm8)
            /// VPSHRDD xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<uint> ConcatenateShiftRightLower(Vector128<uint> upper, Vector128<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi32 (__m256i a, __m256i b, int imm8)
            /// VPSHRDD ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<uint> ConcatenateShiftRightLower(Vector256<uint> upper, Vector256<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi64 (__m128i a, __m128i b, int imm8)
            /// VPSHRDQ xmm, xmm, xmm/m128, imm8
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftRightLower(Vector128<ulong> upper, Vector128<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi64 (__m256i a, __m256i b, int imm8)
            /// VPSHRDQ ymm, ymm, ymm/m256, imm8
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftRightLower(Vector256<ulong> upper, Vector256<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi16 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVW xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftLeftUpperVariable(Vector128<ushort> upper, Vector128<ushort> lower, Vector128<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi16 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVW ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftLeftUpperVariable(Vector256<ushort> upper, Vector256<ushort> lower, Vector256<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi32 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVD xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<uint> ConcatenateShiftLeftUpperVariable(Vector128<uint> upper, Vector128<uint> lower, Vector128<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi32 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVD ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<uint> ConcatenateShiftLeftUpperVariable(Vector256<uint> upper, Vector256<uint> lower, Vector256<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi64 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVQ xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftLeftUpperVariable(Vector128<ulong> upper, Vector128<ulong> lower, Vector128<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi64 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVQ ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftLeftUpperVariable(Vector256<ulong> upper, Vector256<ulong> lower, Vector256<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi16 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVW xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftRightLowerVariable(Vector128<ushort> upper, Vector128<ushort> lower, Vector128<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi16 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVW ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftRightLowerVariable(Vector256<ushort> upper, Vector256<ushort> lower, Vector256<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi32 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVD xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<uint> ConcatenateShiftRightLowerVariable(Vector128<uint> upper, Vector128<uint> lower, Vector128<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi32 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVD ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<uint> ConcatenateShiftRightLowerVariable(Vector256<uint> upper, Vector256<uint> lower, Vector256<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi64 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVQ xmm, xmm, xmm/m128
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftRightLowerVariable(Vector128<ulong> upper, Vector128<ulong> lower, Vector128<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi64 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVQ ymm, ymm, ymm/m256
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftRightLowerVariable(Vector256<ulong> upper, Vector256<ulong> lower, Vector256<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);
        }

        /// <summary>
        /// __m512i _mm512_shldi_epi16 (__m512i a, __m512i b, int imm8)
        /// VPSHLDW zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftLeftUpper(Vector512<ushort> upper, Vector512<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldi_epi32 (__m512i a, __m512i b, int imm8)
        /// VPSHLDD zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<uint> ConcatenateShiftLeftUpper(Vector512<uint> upper, Vector512<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldi_epi64 (__m512i a, __m512i b, int imm8)
        /// VPSHLDQ zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftLeftUpper(Vector512<ulong> upper, Vector512<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi16 (__m512i a, __m512i b, int imm8)
        /// VPSHRDW zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftRightLower(Vector512<ushort> upper, Vector512<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi32 (__m512i a, __m512i b, int imm8)
        /// VPSHRDD zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<uint> ConcatenateShiftRightLower(Vector512<uint> upper, Vector512<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi64 (__m512i a, __m512i b, int imm8)
        /// VPSHRDQ zmm, zmm, zmm/m512, imm8
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftRightLower(Vector512<ulong> upper, Vector512<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi16 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVW zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftLeftUpperVariable(Vector512<ushort> upper, Vector512<ushort> lower, Vector512<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi32 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVD zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<uint> ConcatenateShiftLeftUpperVariable(Vector512<uint> upper, Vector512<uint> lower, Vector512<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi64 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVQ zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftLeftUpperVariable(Vector512<ulong> upper, Vector512<ulong> lower, Vector512<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi16 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVW zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftRightLowerVariable(Vector512<ushort> upper, Vector512<ushort> lower, Vector512<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi32 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVD zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<uint> ConcatenateShiftRightLowerVariable(Vector512<uint> upper, Vector512<uint> lower, Vector512<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi64 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVQ zmm, zmm, zmm/m512
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftRightLowerVariable(Vector512<ulong> upper, Vector512<ulong> lower, Vector512<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);
    }
}

API Usage

// Avx512Vbmi2.ConcatenateShift(LeftUpper/RightLower)(...) example
Vector512<ushort> some_data = GetData(), some_data2 = GetData();
// result consists of lower 8 bit part of ushort data in upper_data and upper 8 bit of ushort data in lower_data
// shift count can be changed to get count of lower bits of ushort data in upper_data and upper 16 - count bit of ushort data in lower_data
var result = Avx512Vbmi2.ConcatenateShiftLeftUpper(upper_data, lower_data, 8);

Alternative Designs

N/A

Risks

N/A

The text was updated successfully, but these errors were encountered:

ghost · 2023-07-15T11:24:25Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

There is already present AVX512-VBMI PermuteVar64x8(x2) intrinsics and approved and soon to be added AVX512-VBMI2 Compress & Expand intrinsics as part of new vector mask proposal. There is little reason if at all to not to add the left-over instructions from aforementioned instruction sets.
Notice:

LeftLower and RightUpeer versions don't exist and can be replaced by lower << count and upper >> count, which produce the same result and the other operand is unused.
I saw MultipleShift discussed in API Review stream, but I can't find it anywhere: Am I not searching hard enough or is it lost into oblivion?

API Proposal

namespace System.Runtime.Intrinsics.X86
{
    [Intrinsic]
    public abstract class Avx512Vbmi : Avx512BW
    {
        public static new bool IsSupported { get => IsSupported; }

        [Intrinsic]
        public new abstract class X64 : Avx512BW.X64
        {
            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public new abstract class VL : Avx512BW.VL
        {
            public static new bool IsSupported { get => IsSupported; }

            /// <summary>
            /// __m128i _mm128_multishift_epi64_epi8 (__m128i a, __m128i b)
            /// VPMULTISHIFTQB xmm, xmm, xmm
            /// </summary>
            public static Vector128<byte> MultipleShift(Vector128<byte> control, Vector128<byte> source) => MultipleShift(control, source);

            /// <summary>
            /// __m256i _mm256_multishift_epi64_epi8 (__m256i a, __m256i b)
            /// VPMULTISHIFTQB ymm, ymm, ymm
            /// </summary>
            public static Vector256<byte> MultipleShift(Vector256<byte> control, Vector256<byte> source) => MultipleShift(control, source);
        }

        /// <summary>
        /// __m512i _mm512_multishift_epi64_epi8 (__m512i a, __m512i b)
        /// VPMULTISHIFTQB zmm, zmm, zmm
        /// </summary>
        public static Vector512<byte> MultipleShift(Vector512<byte> control, Vector512<byte> source) => MultipleShift(control, source);
    }
    [Intrinsic]
    public abstract class Avx512Vbmi2 : Avx512BW
    {
        public static new bool IsSupported { get => IsSupported; }

        [Intrinsic]
        public new abstract class X64 : Avx512BW.X64
        {
            public static new bool IsSupported { get => IsSupported; }
        }

        [Intrinsic]
        public new abstract class VL : Avx512BW.VL
        {
            public static new bool IsSupported { get => IsSupported; }

            /// <summary>
            /// __m128i _mm512_shldi_epi16 (__m128i a, __m128i b, int imm8)
            /// VPSHLDW xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftLeftUpper(Vector128<ushort> upper, Vector128<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi16 (__m256i a, __m256i b, int imm8)
            /// VPSHLDW ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftLeftUpper(Vector256<ushort> upper, Vector256<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldi_epi32 (__m128i a, __m128i b, int imm8)
            /// VPSHLDD xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<uint> ConcatenateShiftLeftUpper(Vector128<uint> upper, Vector128<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi32 (__m256i a, __m256i b, int imm8)
            /// VPSHLDD ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<uint> ConcatenateShiftLeftUpper(Vector256<uint> upper, Vector256<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldi_epi64 (__m128i a, __m128i b, int imm8)
            /// VPSHLDQ xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftLeftUpper(Vector128<ulong> upper, Vector128<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldi_epi64 (__m256i a, __m256i b, int imm8)
            /// VPSHLDQ ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftLeftUpper(Vector256<ulong> upper, Vector256<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi16 (__m128i a, __m128i b, int imm8)
            /// VPSHRDW xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftRightLower(Vector128<ushort> upper, Vector128<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi16 (__m256i a, __m256i b, int imm8)
            /// VPSHRDW ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftRightLower(Vector256<ushort> upper, Vector256<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi32 (__m128i a, __m128i b, int imm8)
            /// VPSHRDD xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<uint> ConcatenateShiftRightLower(Vector128<uint> upper, Vector128<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi32 (__m256i a, __m256i b, int imm8)
            /// VPSHRDD ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<uint> ConcatenateShiftRightLower(Vector256<uint> upper, Vector256<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdi_epi64 (__m128i a, __m128i b, int imm8)
            /// VPSHRDQ xmm, xmm, xmm, imm8
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftRightLower(Vector128<ulong> upper, Vector128<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdi_epi64 (__m256i a, __m256i b, int imm8)
            /// VPSHRDQ ymm, ymm, ymm, imm8
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftRightLower(Vector256<ulong> upper, Vector256<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi16 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVW xmm, xmm, xmm
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftLeftUpperVariable(Vector128<ushort> upper, Vector128<ushort> lower, Vector128<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi16 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVW ymm, ymm, ymm
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftLeftUpperVariable(Vector256<ushort> upper, Vector256<ushort> lower, Vector256<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi32 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVD xmm, xmm, xmm
            /// </summary>
            public static Vector128<uint> ConcatenateShiftLeftUpperVariable(Vector128<uint> upper, Vector128<uint> lower, Vector128<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi32 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVD ymm, ymm, ymm
            /// </summary>
            public static Vector256<uint> ConcatenateShiftLeftUpperVariable(Vector256<uint> upper, Vector256<uint> lower, Vector256<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shldv_epi64 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVQ xmm, xmm, xmm
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftLeftUpperVariable(Vector128<ulong> upper, Vector128<ulong> lower, Vector128<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shldv_epi64 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVQ ymm, ymm, ymm
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftLeftUpperVariable(Vector256<ulong> upper, Vector256<ulong> lower, Vector256<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi16 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVW xmm, xmm, xmm
            /// </summary>
            public static Vector128<ushort> ConcatenateShiftRightLowerVariable(Vector128<ushort> upper, Vector128<ushort> lower, Vector128<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi16 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVW ymm, ymm, ymm
            /// </summary>
            public static Vector256<ushort> ConcatenateShiftRightLowerVariable(Vector256<ushort> upper, Vector256<ushort> lower, Vector256<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi32 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVD xmm, xmm, xmm
            /// </summary>
            public static Vector128<uint> ConcatenateShiftRightLowerVariable(Vector128<uint> upper, Vector128<uint> lower, Vector128<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi32 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVD ymm, ymm, ymm
            /// </summary>
            public static Vector256<uint> ConcatenateShiftRightLowerVariable(Vector256<uint> upper, Vector256<uint> lower, Vector256<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m128i _mm512_shrdv_epi64 (__m128i a, __m128i b, __m128i c)
            /// VPSHLDVQ xmm, xmm, xmm
            /// </summary>
            public static Vector128<ulong> ConcatenateShiftRightLowerVariable(Vector128<ulong> upper, Vector128<ulong> lower, Vector128<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

            /// <summary>
            /// __m256i _mm512_shrdv_epi64 (__m256i a, __m256i b, __m256i c)
            /// VPSHLDVQ ymm, ymm, ymm
            /// </summary>
            public static Vector256<ulong> ConcatenateShiftRightLowerVariable(Vector256<ulong> upper, Vector256<ulong> lower, Vector256<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);
        }

        /// <summary>
        /// __m512i _mm512_shldi_epi16 (__m512i a, __m512i b, int imm8)
        /// VPSHLDW zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftLeftUpper(Vector512<ushort> upper, Vector512<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldi_epi32 (__m512i a, __m512i b, int imm8)
        /// VPSHLDD zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<uint> ConcatenateShiftLeftUpper(Vector512<uint> upper, Vector512<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldi_epi64 (__m512i a, __m512i b, int imm8)
        /// VPSHLDQ zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftLeftUpper(Vector512<ulong> upper, Vector512<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftLeftUpper(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi16 (__m512i a, __m512i b, int imm8)
        /// VPSHRDW zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftRightLower(Vector512<ushort> upper, Vector512<ushort> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi32 (__m512i a, __m512i b, int imm8)
        /// VPSHRDD zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<uint> ConcatenateShiftRightLower(Vector512<uint> upper, Vector512<uint> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdi_epi64 (__m512i a, __m512i b, int imm8)
        /// VPSHRDQ zmm, zmm, zmm, imm8
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftRightLower(Vector512<ulong> upper, Vector512<ulong> lower, [ConstantExpected] byte count) => ConcatenateShiftRightLower(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi16 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVW zmm, zmm, zmm
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftLeftUpperVariable(Vector512<ushort> upper, Vector512<ushort> lower, Vector512<ushort> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi32 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVD zmm, zmm, zmm
        /// </summary>
        public static Vector512<uint> ConcatenateShiftLeftUpperVariable(Vector512<uint> upper, Vector512<uint> lower, Vector512<uint> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shldv_epi64 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVQ zmm, zmm, zmm
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftLeftUpperVariable(Vector512<ulong> upper, Vector512<ulong> lower, Vector512<ulong> count) => ConcatenateShiftLeftUpperVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi16 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVW zmm, zmm, zmm
        /// </summary>
        public static Vector512<ushort> ConcatenateShiftRightLowerVariable(Vector512<ushort> upper, Vector512<ushort> lower, Vector512<ushort> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi32 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVD zmm, zmm, zmm
        /// </summary>
        public static Vector512<uint> ConcatenateShiftRightLowerVariable(Vector512<uint> upper, Vector512<uint> lower, Vector512<uint> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);

        /// <summary>
        /// __m512i _mm512_shrdv_epi64 (__m512i a, __m512i b, __m512i c)
        /// VPSHLDVQ zmm, zmm, zmm
        /// </summary>
        public static Vector512<ulong> ConcatenateShiftRightLowerVariable(Vector512<ulong> upper, Vector512<ulong> lower, Vector512<ulong> count) => ConcatenateShiftRightLowerVariable(upper, lower, count);
    }
}

API Usage

// Avx512Vbmi2.ConcatenateShift(LeftUpper/RightLower)(...) example
Vector512<ushort> some_data = GetData(), some_data2 = GetData();
// result consists of lower 8 bit part of ushort data in upper_data and upper 8 bit of ushort data in lower_data
// shift count can be changed to get count of lower bits of ushort data in upper_data and upper 16 - count bit of ushort data in lower_data
var result = Avx512Vbmi2.ConcatenateShiftLeftUpper(upper_data, lower_data, 8);

// Avx512Vbmi.MultipleShift(...) example
Vector512<ushort> d = GetData(), control = Vector512.Create(0x0B_1B_2B_3B_04_14_24_34);
// Given control makes every 64 bit part of result structure is this: (d[59:52], d[43:36], d[27:20], d[11:4], (d[3:0] << 4) | d[63:60], d[51:44], d[35:28], d[19:12])
var result = Avx512Vbmi.MultupleShift(control, d);

Alternative Designs

N/A

Risks

N/A

Author:	MadProbe
Assignees:	-
Labels:	`api-suggestion`, `area-System.Runtime.Intrinsics`
Milestone:	-

tannergooding · 2023-07-19T14:42:26Z

MultiShift is part of #86168

ConcatenateShift hasn't gone through any review yet and I need to think more on the general name, etc. It's too late in the cycle for either to land for .NET 8, however. Both will end up being .NET 9 instead.

MadProbe · 2023-07-19T20:09:14Z

I have deleted the duplicated MultiShift and changed the title and proposal description accordingly.

tannergooding · 2023-07-21T17:58:32Z

Need to give some consideration around the suggested names, but this will be a .NET 9 change regardless; so marking with needs-further-triage for the moment.

lemire · 2024-03-15T17:42:05Z

Note that AMD Zen 4 and better as well as Intel Ice Lake and better support VBMI2.

As motivation, this would make it possible support fast Unicode transcoding function in C#, like what is done in the simdutf library (which is part of the Node.js runtime).

See https://arxiv.org/pdf/2212.05098.pdf

cc @EgorBo

MadProbe added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jul 15, 2023

dotnet-issue-labeler bot added the area-System.Runtime.Intrinsics label Jul 15, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Jul 15, 2023

MadProbe changed the title ~~[API Proposal]: Expose remaining AVX512-VBMI & AVX512-VBMI2 hardware instructions~~ [API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions Jul 19, 2023

tannergooding added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 21, 2023

tannergooding added this to the 9.0.0 milestone Jul 21, 2023

stephentoub modified the milestones: 9.0.0, 10.0.0 Jul 22, 2024

This was referenced Sep 10, 2024

Decode AVX512 UTF8 simdutf/SimdBase64#34

Open

Consider incorporating SimdBase64 #107816

Open

jeffhandley removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions #88946

[API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions #88946

MadProbe commented Jul 15, 2023 •

edited

Loading

ghost commented Jul 15, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

tannergooding commented Jul 19, 2023

MadProbe commented Jul 19, 2023

tannergooding commented Jul 21, 2023

lemire commented Mar 15, 2024

[API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions #88946

[API Proposal]: Expose remaining AVX512-VBMI2 hardware instructions #88946

Comments

MadProbe commented Jul 15, 2023 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

ghost commented Jul 15, 2023

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

tannergooding commented Jul 19, 2023

MadProbe commented Jul 19, 2023

tannergooding commented Jul 21, 2023

lemire commented Mar 15, 2024

MadProbe commented Jul 15, 2023 •

edited

Loading