Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] Implement BitwiseSelect hardware intrinsic #472

Merged

Conversation

echesakov
Copy link
Contributor

@echesakov echesakov commented Dec 3, 2019

This implements BitwiseSelect Arm64 intrinsic.

This one is interesting since as it was proposed in https://github.com/dotnet/corefx/issues/26181 the C# intrinsic method can be implemented to choose between one of the three different instructions bsl, bit, bif depending on the allocated destination register (this can be op1Reg, op2Reg, op3Reg, or some other register).

Below I demonstrate how this works:

using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;

namespace GitHub_472
{
    struct Runner
    {
        Vector128<Byte> op1;
        Vector128<Byte> op2;
        Vector128<Byte> op3;
        Vector128<Byte> dst;

        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void Bsl()
        {
           dst = AdvSimd.BitwiseSelect(op1, op2, op3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void Bif()
        {
           var op4 = AdvSimd.BitwiseSelect(op1, op2, op3);
           dst = AdvSimd.BitwiseSelect(op1, op4, op3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void Bit()
        {
           var op4 = AdvSimd.BitwiseSelect(op1, op2, op3);
           dst = AdvSimd.BitwiseSelect(op1, op2, op4);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        public unsafe void MovBsl()
        {
           var op4 = AdvSimd.BitwiseSelect(op1, op2, op3);
           var op5 = AdvSimd.BitwiseSelect(op3, op1, op2);
           dst = AdvSimd.Add(op4, op5);
        }
    }

    class Program
    {
        static int Main(string[] args)
        {
            var runner = new Runner();

            runner.Bsl();
            runner.Bit();
            runner.Bif();
            runner.MovBsl();

            return 100;
        }
    }
}

In Bsl() case op1Reg is dead after the intrinsic call and can be used as a destination register.

00007ffe`4e9b4f20 a9bf7bfd stp         fp,lr,[sp,#-0x10]!
00007ffe`4e9b4f24 910003fd mov         fp,sp
00007ffe`4e9b4f28 3dc00010 ldr         q16,[x0]
00007ffe`4e9b4f2c 3dc00411 ldr         q17,[x0,#0x10]
00007ffe`4e9b4f30 3dc00812 ldr         q18,[x0,#0x20]
00007ffe`4e9b4f34 6e721e30 bsl         v16.16b,v17.16b,v18.16b
00007ffe`4e9b4f38 3d800c10 str         q16,[x0,#0x30]
00007ffe`4e9b4f3c a8c17bfd ldp         fp,lr,[sp],#0x10
00007ffe`4e9b4f40 d65f03c0 ret

In Bit() case op1Reg and op2Reg are live and LSRA allocates op3Reg as a destination register.

00007ffe`4e9c4f60 a9bf7bfd stp         fp,lr,[sp,#-0x10]!
00007ffe`4e9c4f64 910003fd mov         fp,sp
00007ffe`4e9c4f68 3dc00010 ldr         q16,[x0]
00007ffe`4e9c4f6c 3dc00411 ldr         q17,[x0,#0x10]
00007ffe`4e9c4f70 3dc00812 ldr         q18,[x0,#0x20]
00007ffe`4e9c4f74 6eb01e32 bit         v18.16b,v17.16b,v16.16b
00007ffe`4e9c4f78 6e721e30 bsl         v16.16b,v17.16b,v18.16b
00007ffe`4e9c4f7c 3d800c10 str         q16,[x0,#0x30]
00007ffe`4e9c4f80 a8c17bfd ldp         fp,lr,[sp],#0x10
00007ffe`4e9c4f84 d65f03c0 ret

In Bif() case op1Reg and op3Reg are live and LSRA allocates op2Reg as a destination register.

00007ffe`4e9c4fa0 a9bf7bfd stp         fp,lr,[sp,#-0x10]!
00007ffe`4e9c4fa4 910003fd mov         fp,sp
00007ffe`4e9c4fa8 3dc00010 ldr         q16,[x0]
00007ffe`4e9c4fac 3dc00411 ldr         q17,[x0,#0x10]
00007ffe`4e9c4fb0 3dc00812 ldr         q18,[x0,#0x20]
00007ffe`4e9c4fb4 6ef01e51 bif         v17.16b,v18.16b,v16.16b
00007ffe`4e9c4fb8 6e721e30 bsl         v16.16b,v17.16b,v18.16b
00007ffe`4e9c4fbc 3d800c10 str         q16,[x0,#0x30]
00007ffe`4e9c4fc0 a8c17bfd ldp         fp,lr,[sp],#0x10
00007ffe`4e9c4fc4 d65f03c0 ret

In MovBsl() all three op1Reg, op2Reg and op3Reg are live and v19 is allocated as destination register.

00007ffe`4e9d4fe0 a9bf7bfd stp         fp,lr,[sp,#-0x10]!
00007ffe`4e9d4fe4 910003fd mov         fp,sp
00007ffe`4e9d4fe8 3dc00010 ldr         q16,[x0]
00007ffe`4e9d4fec 3dc00411 ldr         q17,[x0,#0x10]
00007ffe`4e9d4ff0 3dc00812 ldr         q18,[x0,#0x20]
00007ffe`4e9d4ff4 4eb01e13 mov         v19.16b,v16.16b
00007ffe`4e9d4ff8 6e721e33 bsl         v19.16b,v17.16b,v18.16b
00007ffe`4e9d4ffc 6ef21e30 bif         v16.16b,v17.16b,v18.16b
00007ffe`4e9d5000 4e308670 add         v16.16b,v19.16b,v16.16b
00007ffe`4e9d5004 3d800c10 str         q16,[x0,#0x30]
00007ffe`4e9d5008 a8c17bfd ldp         fp,lr,[sp],#0x10
00007ffe`4e9d500c d65f03c0 ret

Fixes https://github.com/dotnet/corefx/issues/26181

@echesakov echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Dec 3, 2019
@echesakov echesakov force-pushed the AArch64/HWIntrinsics/BitwiseSelect branch from b28a6af to 498bd2d Compare December 3, 2019 21:56
@echesakov echesakov marked this pull request as ready for review December 5, 2019 00:46
@echesakov
Copy link
Contributor Author

@CarolEidt @TamarChristinaArm @tannergooding PTAL
cc @dotnet/jit-contrib

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just one minor question

@echesakov echesakov merged commit f2390f0 into dotnet:master Dec 5, 2019
@echesakov echesakov deleted the AArch64/HWIntrinsics/BitwiseSelect branch December 5, 2019 20:10
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants