Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile composite with avx2 on x64 #55057

Merged
merged 2 commits into from
Jul 6, 2021
Merged

compile composite with avx2 on x64 #55057

merged 2 commits into from
Jul 6, 2021

Conversation

mangod9
Copy link
Member

@mangod9 mangod9 commented Jul 2, 2021

Compile Fx composite images using avx2.

@@ -32,6 +32,9 @@

<ItemGroup>
<PublishReadyToRunCrossgen2ExtraArgsList Include="--compositekeyfile:$(AssemblyOriginatorKeyFile)"/>
<!-- Compile with avx2 on x64 -->
<PublishReadyToRunCrossgen2ExtraArgsList Condition="'$(TargetArchitecture)' == 'x64'" Include="--inputbubble"/>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently inputbubble is required with instruction-set. Shouldnt composite just imply that? @AntonLapounov @trylek ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it implies; I think the check in Crossgen2 should be relaxed accordingly. @davidwrighton has filed dotnet/sdk#17760, but I am not sure why it is filed on the SDK side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this needs to be fixed in cg2. perhaps we should just move the bug over.

Copy link
Member

@AntonLapounov AntonLapounov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mangod9
Copy link
Member Author

mangod9 commented Jul 2, 2021

wonder whether we should also compile with -Ot based on #52708 (comment) ?

@AntonLapounov
Copy link
Member

That comment was about "workloads that use R2R with tiered JIT compilation disabled", which is not the normal configuration.

@mangod9
Copy link
Member Author

mangod9 commented Jul 2, 2021

yeah true, that's unlikely for containers.

@@ -32,6 +32,9 @@

<ItemGroup>
<PublishReadyToRunCrossgen2ExtraArgsList Include="--compositekeyfile:$(AssemblyOriginatorKeyFile)"/>
<!-- Compile with avx2 on x64 -->
Copy link
Member

@tannergooding tannergooding Jul 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning for "just AVX2"?

That is, specifying avx2 will imply anything in the direct heirarchy, so it will include:

  • X86Base
  • SSE
  • SSE2
  • SSE3
  • SSSE3
  • SSE4.1
  • SSE4.2
  • AVX

However, there are certain branch instruction sets that aren't part of the direct hierarchy. These are generally considered to exist due to age or vendor (AMD vs Intel) differences in how things are exposed and so should be reasonable to also include:

  • AES (2010+)
  • PCLMULQDQ (2010+)
  • POPCNT (2008+)

Likewise, there are instruction sets that were introduced alongside AVX2 (which was introduced in Haswell, 2013+) and are realistically provided SxS in machines that provide AVX2 (only missing in machines like newer Atom processors that are also missing AVX2):

  • BMI1
  • BMI2
  • FMA
  • LZCNT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avx2 is what we have been running our benchmarks with, which showed decent startup improvements. We could subsequently add others as you suggest, if there is measurable perf improvement when enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POPCNT and LZCNT are both used in various hot paths (e.g. in Span or UTF8<->UTF16 conversion).
Intel technically considers POPCNT part of SSE4.2 and LZCNT part of BMI1, they are split out as not all vendors (such as AMD) consider them this way (but they are functionally present in the same scenarios on the other vendors).

FMA and BMI2 aren't used by any of our own startup paths but may be called from user code if they are doing anything around Math.BigMul or Math/MathF.FusedMultiplyAdd

AES and PCLMULQDQ are likely fine to not enable for R2R as they are edge case scenarios, but they should also be harmless to enable on any machine that supports AVX2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok thanks for the info.

POPCNT and LZCNT are both used in various hot paths (e.g. in Span or UTF8<->UTF16 conversion).

Will add these to current benchmarks to measure the perf improvement and then enable for composite, assuming most modern CPUs support these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming most modern CPUs support these.

They should. POPCNT will be in basically any CPU that supports SSE4.2 (2008+) and LZCNT/BMI1 in basically any CPU that supports AVX2 (2013+).
Since we're enabling AVX2 here, both should effectively be available in the same scenarios.

  • There are minor nuances here for pre-AVX2 CPUs between AMD and Intel, but that isn't particularly relevant if we are saying AVX2 is "safe" to be the baseline/default

@mangod9 mangod9 merged commit aeb467e into dotnet:main Jul 6, 2021
@mangod9 mangod9 deleted the useavx2 branch July 6, 2021 18:32
@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants