-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: AVX-VNNI intrinsics #43780
Comments
Tagging subscribers to this area: @tannergooding, @jeffhandley |
Thanks for the proposal, these LGTM. It might be beneficial to name the operands |
@tannergooding What does the order of operations normally look like re: API review and implementation? Presumably we won't implement until we have test hardware, even if the API is approved? |
That is likely something that needs more discussion. One could imagine implementing it as an "experimental API" even without hardware. But I don't believe we would ship without being able to validate things more end to end. Of course Carol, Bruce, or others may have different opinions here 😄 |
-- I'd also like to double check what we did here for ARM, as I feel we included CC. @echesakovMSFT |
Yes, we did include runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs Lines 10153 to 10158 in 54906ea
and runtime/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/AdvSimd.cs Lines 2056 to 2061 in 54906ea
Presumably, we can name the intrinsics the same way and, as an example, the one that corresponds to Although on Arm64 "widening" always means doubling the size of the result while on Intel it would also include doubling and quadrupling ( |
Thanks for the feedback.
It looks like |
If I understand what this instructions does - multiplies two values (bytes or shorts) with widening the result to larger type (short or int) and sum them up; if the value exceeds int positive or negative boundaries, it will saturate the value then the name seems right. I don't think we need to specify SignedSaturation since it's clear from the resulting value type. I am curious why in MultiplyAdd(Vector128<int> source, Vector128<byte> a, Vector128<sbyte> b)
|
The operand types are from the programming reference for
and that is also how the C intrinsic is designed:
|
I updated the original post based on suggestions. |
@tannergooding Since the HW spec is out in the public, I think the APIs are ready to be reviewed. I agree that it may not make sense to ship them until the hardware becomes available. Is there an experimental repo/branch where we can make a PR to get the implementation reviewed prior to hardware availability? |
We can discuss exposing them in I'm fine with marking this CC. @echesakovMSFT |
Looks good as proposed. One thing that was observed is one set of methods is short/short, but the other is byte/sbyte. Double check the sign bits. namespace System.Runtime.Intrinsics.X86
{
public abstract class AvxVnni : Avx2
{
internal AvxVnni() { }
public static new bool IsSupported { [Intrinsic] get { return false; } }
public new abstract class X64 : Avx2.X64
{
internal X64() { }
public static new bool IsSupported { [Intrinsic] get { return false; } }
}
/// <summary>
/// __m128i _mm_dpbusd_epi32 (__m128i src, __m128i a, __m128i b)
/// VPDPBUSD xmm, xmm, xmm
/// </summary>
public static Vector128<int> MultiplyWideningAndAdd(Vector128<int> addend, Vector128<byte> left, Vector128<sbyte> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m128i _mm_dpwssd_epi32 (__m128i src, __m128i a, __m128i b)
/// VPDPWSSD xmm, xmm, xmm
/// </summary>
public static Vector128<int> MultiplyWideningAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m256i _mm256_dpbusd_epi32 (__m256i src, __m256i a, __m256i b)
/// VPDPBUSD ymm, ymm, ymm
/// </summary>
public static Vector256<int> MultiplyWideningAndAdd(Vector256<int> addend, Vector256<byte> left, Vector256<sbyte> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m256i _mm256_dpwssd_epi32 (__m256i src, __m256i a, __m256i b)
/// VPDPWSSD ymm, ymm, ymm
/// </summary>
public static Vector256<int> MultiplyWideningAndAdd(Vector256<int> addend, Vector256<short> left, Vector256<short> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m128i _mm_dpbusds_epi32 (__m128i src, __m128i a, __m128i b)
/// VPDPBUSDS xmm, xmm, xmm
/// </summary>
public static Vector128<int> MultiplyWideningAndAddSaturate(Vector128<int> addend, Vector128<byte> left, Vector128<sbyte> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m128i _mm_dpwssds_epi32 (__m128i src, __m128i a, __m128i b)
/// VPDPWSSDS xmm, xmm, xmm
/// </summary>
public static Vector128<int> MultiplyWideningAndAddSaturate(Vector128<int> addend, Vector128<short> left, Vector128<short> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m256i _mm256_dpbusds_epi32 (__m256i src, __m256i a, __m256i b)
/// VPDPBUSDS ymm, ymm, ymm
/// </summary>
public static Vector256<int> MultiplyWideningAndAddSaturate(Vector256<int> addend, Vector256<byte> left, Vector256<sbyte> right) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m256i _mm256_dpwssds_epi32 (__m256i src, __m256i a, __m256i b)
/// VPDPWSSDS ymm, ymm, ymm
/// </summary>
public static Vector256<int> MultiplyWideningAndAddSaturate(Vector256<int> addend, Vector256<short> left, Vector256<short> right) { throw new PlatformNotSupportedException(); }
}
} |
That's the way the instructions were designed. The operand types are from the programming reference for
And for
|
This was implemented and merged. |
Background and Motivation
The upcoming Intel® Alder Lake and Sapphire Rapids processors will introduce AVX-VNNI instruction set architecture which provides VEX-encoded versions of the Vector Neural Network Instructions (reference: https://software.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf). This proposal aims to expose AVX-VNNI instructions via intrinsics.
Proposed API
/cc @tannergooding @CarolEidt
The text was updated successfully, but these errors were encountered: