Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LTO for linux as well #103058

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

kunalspathak
Copy link
Member

No description provided.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 5, 2024
@kunalspathak
Copy link
Member Author

kunalspathak commented Jun 5, 2024

Seeing the expected TP diffs :)

image

@@ -942,6 +938,10 @@ if (MSVC)
set(CMAKE_ASM_MASM_FLAGS "${CMAKE_ASM_MASM_FLAGS} /nologo")
endif (MSVC)

set(CMAKE_INTERPROCEDURAL_OPTIMIZATION ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will use -flto=thin.

-flto might give better performance at the cost of slower compiles of coreclr. However, there is no cmake flag for this, so it would have to be done using the flags directly.

I'd recommend trying this out in a future PR.

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@KennethHoff
Copy link

Seeing the expected TP diffs :)

image

What does this mean? I'm guessing it's not a 10-21% performance lift on Linux platforms.

@tannergooding
Copy link
Member

What does this mean? I'm guessing it's not a 10-21% performance lift on Linux platforms.

This isn't for code that RyuJIT produces, but rather the code generated by clang/gcc for RyuJIT itself. So RyuJIT executes about 15-20% less instructions in order to compile IL to the target platform

@EgorBo
Copy link
Member

EgorBo commented Jun 5, 2024

We might expect some improvements for performance as well since it also enables LTO for VM and GC

@build-analysis build-analysis bot mentioned this pull request Jun 8, 2024
@kunalspathak kunalspathak marked this pull request as ready for review June 11, 2024 23:05
Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's add a few comments though so we know why certain decisions were made/are required when we look at this in the future.

@@ -152,6 +153,15 @@ elseif (CLR_CMAKE_HOST_UNIX)
endif()
endif(MSVC)

check_ipo_supported(RESULT result OUTPUT output)
if(result AND NOT CLR_CMAKE_TARGET_APPLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about why IPO is disabled on apple platforms?

@@ -61,6 +61,7 @@ else()
target_include_directories(cdac_data_descriptor BEFORE PRIVATE ${VM_DIR})
target_include_directories(cdac_data_descriptor BEFORE PRIVATE ${VM_DIR}/${ARCH_SOURCES_DIR})
target_include_directories(cdac_data_descriptor PRIVATE ${CLR_DIR}/interop/inc)
set_target_properties(cdac_data_descriptor PROPERTIES INTERPROCEDURAL_OPTIMIZATION OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about why we're disabling IPO here?

@MichalStrehovsky
Copy link
Member

If this is only adding -flto=thin like #103058 (comment) says, does this interact with the -flto we set here?

target_compile_options(${TargetName} PRIVATE -flto -fprofile-instr-use=${ProfilePath} -Wno-profile-instr-out-of-date -Wno-profile-instr-unprofiled)

Seeing the expected TP diffs :)

SPMI jobs seem to explicitly pass /p:NoPgoOptimize=true so the RyuJIT binary they use is not the RyuJIT binary we ship. Passing NoPgoOptimize would explicitly skip the -flto line in pgosupport.cmake since we currently condition LTO on PGO.

runtime (Build linux-x64 Release NativeAOT) Failing after 39m

We should not use LTO on native AOT runtime binaries. LTO would place LLVM bitcode (instead of machine code) into the object files that we ship. Such object files can only be linked with a matching LLVM linker plugin. We don't have LLVM or LLVM version requirements for PublishAot.

@am11 am11 added area-Infrastructure-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 13, 2024
Copy link
Contributor

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this interact with the -flto we set here?

@kunalspathak Could you please shed some light on this?

If we are specifying both -flto and -flto=thin on the command line when building the shipping binaries, there is a chance that -flto=thin is specified second, overrides the full -flto and we will see large scale perf regression.

It would be best to avoid specifying both -flto and -flto=thin on the command line when building the shipping binaries to avoid the problems with command line options overriding each other.

@kunalspathak
Copy link
Member Author

@kunalspathak Could you please shed some light on this?

So my impression was that LTO was not passed at all even for shipped binaries, but the fact that it is passed in pgosupport.cmake confirms that we do use LTO for shipped binaries.

If we are specifying both -flto and -flto=thin on the command line when building the shipping binaries, there is a chance that -flto=thin is specified second, overrides the full -flto and we will see large scale perf regression.

It would be best to avoid specifying both -flto and -flto=thin on the command line when building the shipping binaries to avoid the problems with command line options overriding each other.

Agree that we should not pessimize the flags to get regression on shipped binaries.

SPMI jobs seem to explicitly pass /p:NoPgoOptimize=true so the RyuJIT binary they use is not the RyuJIT binary we ship

So I am trying to understand how this is working currently. If I compare the TP difference with and without the NoPgoOptimize flag using superpmi, I don't see much improvements (see data below).

[21:57:52] Total instructions executed by base: 34565002901
[21:57:52] Total instructions executed by diff: 34564841018
[21:57:52] Total instructions executed delta: -161883 (-0.00% of base)

However, with this PR, even when compiled with NoPgoOptimize, I see the improvements I shared earlier. So what should be the right model here? In long term, I just want to use -flto (and probably I should update this PR to do that) like the way we do in pgosupport.cmake. Thoughts?

@jkotas
Copy link
Member

jkotas commented Jun 14, 2024

If I compare the TP difference with and without the NoPgoOptimize flag using superpmi, I don't see much improvements (see data below).

Do you see packages with PGO optimization data being downloaded and the data being successfully applied by the compiler?

@jkotas
Copy link
Member

jkotas commented Jun 14, 2024

I just want to use -flto (and probably I should update this PR to do that) like the way we do in pgosupport.cmake. Thoughts?

Yes, that would make sense. If nothing else, it should reduce the gap between the shipping config and what you are measuring.

@jkoritzinsky
Copy link
Member

If you set the CMAKE_CXX_LINK_OPTIONS_IPO and CMAKE_C_LINK_OPTIONS_IPO variables to "-flto" when targeting Linux, we'll get behavior consistent with today and still be able to use the CMake IPO support (which is useful for being able to turn it on/off for select targets, not enabling it on platforms where it's not supported, and getting target-specific required options added for us).

@jkoritzinsky
Copy link
Member

https://gitlab.kitware.com/cmake/cmake/-/issues/23136 tracks being able to specify the flavor of LTO in CMake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants