Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Why was LTO disabled in version 1.22.0? #124

Open
tapir2342 opened this issue Apr 18, 2024 · 13 comments
Open

Question: Why was LTO disabled in version 1.22.0? #124

tapir2342 opened this issue Apr 18, 2024 · 13 comments

Comments

@tapir2342
Copy link

Hi @skeeto , thanks for making this. Can you explain (or point me to resources) why LTO got disabled in version 1.22.0?
Thank you.

@skeeto
Copy link
Owner

skeeto commented Apr 19, 2024 via email

@Peter0x44
Copy link
Contributor

Peter0x44 commented Apr 19, 2024

@skeeto LTO's biggest benefit seems to be not so much speed, but code size. It makes a bigger difference than you'd expect in that regard, there are some benchmarks here:
https://youtu.be/GufwdypTfrE

Exposing you to more compiler bugs is unfortunately true, and even more so for *-w64-mingw32, though. I wanted to build cppcheck with LTO as an example and encountered this:
https://gcc.gnu.org/PR106103

It's supposedly fixed, but I'd still like to revisit when gcc 14.1 is released.

I still think enabling it in the toolchain is overall a good idea. And potentially it's a good avenue to reduce some of w64devkit's size.

@skeeto
Copy link
Owner

skeeto commented Apr 19, 2024

Thanks, @Peter0x44, that was an interesting talk. My main takeaways from the talk:

  • Performance-wise, LTO doesn't do much without PGO.
  • LTO significantly reduces binary size even without PGO. 23% in his benchmarks.
  • LLVM is the opposite: LTO makes binaries bigger.
  • Semantic interposition is costlier than I realized (though that's mostly irrelevant for PE). Obvious in hindsight.

(Unfortunately, as I'm sure you already realize, PGO isn't going to be practical when building w64devkit itself due to cross-compilation.)

If I re-enable LTO per the sed command above, then build Cppcheck with LTO using that toolchain, I get a 10% size reduction (~300K). I don't find this particularly impressive, especially for how much it costs (125% build time increase, an extra 9M of toolchain distributed, 22M installed). I tried again with my GCC 14 snapshot branch, same results. Also, that LTO bug is not fixed as of the April 5th GCC 14 snapshot, so I still needed the declone option.

If you'd like to reproduce this yourself, I used w64devkit's cppcheck.mak with these changes, on Cppcheck 2.10:

--- a/cppcheck.mak
+++ b/cppcheck.mak
@@ -3,5 +3,5 @@
 obj      := $(src:.cpp=.o)
-CXXFLAGS := -w -Os -Ilib $(addprefix -I,$(ext))
+CXXFLAGS := -w -Os -Ilib -flto -fno-declone-ctor-dtor $(addprefix -I,$(ext))
 cppcheck.exe: $(obj)
-	$(CXX) -s -o $@ $(obj) -lshlwapi
+	$(CXX) -s -o $@ -Os -flto=auto $(obj) -lshlwapi
 cppcheck: $(obj)

Disabling LTO is a bit of experiment. It sat on the master branch for over two months without objections, so I felt comfortable trying it in a release. I could be persuaded it's worth reverting back to the default, especially as LTO-related bugs are fixed. but I'm not there yet.

@Peter0x44
Copy link
Contributor

Perhaps bigger gains can be achieved when building gcc or potentially busybox-w32 with it also. It's on my to-do list to investigate the potential benefits there.

@rmyorston
Copy link

For some time now (six years or so?) my release builds of busybox-w32 have used LTO. It makes the binaries smaller and doesn't seem to have resulted in any issues.

The more recent clang/aarch64 build doesn't use LTO as it made the binary slightly larger. By 0.2%. Oh no!

@skeeto
Copy link
Owner

skeeto commented May 8, 2024 via email

@Peter0x44
Copy link
Contributor

Peter0x44 commented May 8, 2024

@rmyorston aarch64-w64-mingw32 support was recently merged for gcc 15, perhaps it's worth investigating if it would reduce the executable size.

Maybe it's worth considering for w64devkit also, but there are few WoA devices that can be purchased, and gcc won't support it officially until next year, I wouldn't put it on a high priority. Just something to be aware of.

@MrMadguy64
Copy link

MrMadguy64 commented May 12, 2024

Please return LTO as soon, as possible. I develop project for DOS and code size is very important for me. I've googled this problem and it seems like there is some work in progress to fix it. For now it's recommended to mark _pei386_runtime_relocator as used.

@Peter0x44
Copy link
Contributor

Another thought for LTO, but I don't know the practical implications of it yet. The system compiler of arch has:
Supported LTO compression algorithms: zlib zstd
w64devkit only has zlib. Perhaps it's worth compiling zstd and letting gcc use it when lto is deemed worth reintroducing, but that has its own binary size concerns, and also the question of whether it's even useful.

@N-R-K
Copy link

N-R-K commented Jun 20, 2024

Supported LTO compression algorithms

That's just for the intermediate object files isn't it? I don't think it should have any effect on the final binary (where lto information gets stripped).

@R-Goc
Copy link

R-Goc commented Sep 17, 2024

Could this be reenabled? This breaks existing build configurations from performance optimized libraries.

@ShawSumma
Copy link

This broke my build of MiniVM. It was a simple enough fix, but slowed down the VM's GC significantly.

@Peter0x44
Copy link
Contributor

Peter0x44 commented Nov 15, 2024

I wanted to experiment with this again, so I went to build an lto-enabled w64devkit for my own usage, but it had some minor problems.
Simply doing:

$ sed -i /--disable-lto/d Dockerfile

Resulted in ar being broken for LTO usages.

$ cat square.c
int square(int x) { return x*x; }
$ cat test.c
#include <stdio.h>

int square(int);

int main(void)
{
        printf("%d squared is %d", 4, square(4));
}
$ gcc -flto -c square.c
$ ar rcs  libsquare.a square.o
ar: square.o: plugin needed to handle lto object

libsquare.a was still created, but linking it did not work

$ gcc test.c libsquare.a
C:/programming/toolchains/w64devkit/bin/ld.exe: C:\Users\peter\AppData\Local\Temp\ccUerGLN.o:test.c:(.text+0x67): undefined reference to `square'
collect2.exe: error: ld returned 1 exit status

gcc-ar however, did work correctly.

What fixed it was copying w64devkit\libexec\gcc\x86_64-w64-mingw32\14.2.0\liblto_plugin.dll to w64devkit\lib\bfd-plugins\liblto_plugin.dll

I suspect this has to do with the strange way gcc is configured by w64devkit

w64devkit/Dockerfile

Lines 256 to 257 in 8ea0fee

--prefix=$PREFIX \
--with-sysroot=$PREFIX/$ARCH \

with the sysroot having $ARCH at the end of it

I did not have time to test any theories relating to this, but perhaps @skeeto might have some ideas. I haven't found this to be necessary in my own experiments building mingw-w64 toolchains outside of w64devkit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants