-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Link Time Optimisation #3824
base: main
Are you sure you want to change the base?
Conversation
ref: https://gcc.gnu.org/wiki/LinkTimeOptimization Link Time Optimisation defers final compilation until the link stage, which allows more aggressive inlining and optimisations to occur, as more information is known at link time vs when each object is compiled. For the esp32dev platform, I have the following size reductions: Without LTO: 1313024 .pio/build/esp32dev/firmware.bin With LTO: 1214256 .pio/build/esp32dev/firmware.bin Signed-off-by: Alastair D'Silva <[email protected]>
These require investigation: /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: libc_replacements.cpp.o (symbol from plugin): in function `_open_r': (.text+0x0): multiple definition of `puts'; /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/lib/libc.a(lib_a-puts.o):/workdir/repo/newlib/newlib/libc/stdio/puts.c:129: first defined here /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:129:3: warning: type 'struct ArgStructure' violates the C++ One Definition Rule [-Wodr] 129 | } ArgStructure; | ^ /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:26:8: note: a different type is defined in another translation unit 26 | struct ArgStructure { | ^ /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:128:8: note: the first difference of corresponding definitions is field 'functionInfo' 128 | void* functionInfo; | ^ /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:28:16: note: a field of same name but different type is defined in another translation unit 28 | FunctionInfo* functionInfo = nullptr; | ^ /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/core_esp8266_wiring_digital.cpp:129:3: note: type 'void' should match type 'struct FunctionInfo' 129 | } ArgStructure; | ^ /home/runner/.platformio/packages/framework-arduinoespressif8266/cores/esp8266/FunctionalInterrupt.h:21:8: note: the incompatible type is defined here 21 | struct FunctionInfo { | ^ /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: .pio/build/nodemcuv2/firmware.elf section `.text1' will not fit in region `iram1_0_seg' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:(.text+0x4): undefined reference to `stack_thunk_save' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:(.text+0x8): undefined reference to `SigningVerifier_verify' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_SigningVerifier_verify': <artificial>:(.text+0x1f): undefined reference to `SigningVerifier_verify' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: <artificial>:(.text+0x32): undefined reference to `stack_thunk_fatal_smashing' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: <artificial>:(.text+0x72): undefined reference to `stack_thunk_fatal_smashing' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvapp_ack': <artificial>:(.text+0xb2): undefined reference to `stack_thunk_fatal_smashing' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvapp_buf': <artificial>:(.text+0xf2): undefined reference to `stack_thunk_fatal_smashing' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `thunk_br_ssl_engine_recvrec_ack': <artificial>:(.text+0x132): undefined reference to `stack_thunk_fatal_smashing' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o:<artificial>:(.text+0x172): more undefined references to `stack_thunk_fatal_smashing' follow /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans0.ltrans.o: in function `__wrap_system_restart_local': <artificial>:(.text.__wrap_system_restart_local+0x2): undefined reference to `postmortem_report' /home/runner/.platformio/packages/toolchain-xtensa/bin/../lib/gcc/xtensa-lx106-elf/10.3.0/../../../../xtensa-lx106-elf/bin/ld: /tmp/firmware.elf.ojg0kn.ltrans3.ltrans.o: in function `_ZN8Espalexa18handleAlexaApiCallEP21AsyncWebServerRequest': <artificial>:(.text+0x3c68): undefined reference to `_ZN23NeoEsp8266I2sMethodCore15c_StateDataSizeE' collect2: error: ld returned 1 exit status *** [.pio/build/nodemcuv2/firmware.elf] Error 1 Signed-off-by: Alastair D'Silva <[email protected]>
Hi, I've played with LTO some time ago, and actually ran into some issues:
As there seem to be some drawbacks, it would be good if you make LTO optional. |
This might be interesting, too platformio/platform-espressif32#702 |
That's weird, that's the same change that I put a PR in for this morning. I wonder if it's been backed out or clobbered at some point. [later] Ah, it was never submitted, then the OP pulled the patchset and presumably it was forgotten about. |
Another thing that I remember: the arduino-esp32 build scripts have an explicit "-fno-lto" so you might also need |
Interesting, that looks like it snuck in this commit, with no explanation: I just added it to build_unflags and saved a little more: Do you have any references for the a loss in performance? (that should be repro'd with a current compiler & reported upstream) AFAIK, LTO eliminates call overhead and dead branches, which should only ever improve performance. |
Nothing that's good for a test case, sorry. It was just something I recognized sporadically, when playing with lto and other optimization flags. Actually this guy makes a similar statement: The other problem is debugging and stack traces - these become very unreadable due to lto. However its possible to still get something usefull if compiling with |
I just rolled a build and threw it onto a C3 I had spare, and there was no sign of the boot loop mentioned in the other thread. I'd put it down to a> ancient compiler and b> relatively new architecture. It looks like the compiler we are currently using generates reasonable code for the C3. I was able to boot it any join it to my Wifi network. (there's no headers on it though so I can't test with a strip) Do you have any suggestions on how to make things optional? I can't see a nice way to conditionally fiddle the flags. |
Signed-off-by: Alastair D'Silva <[email protected]>
We are using LTO since over 1 year for Tasmota (except for the C3). Flash size usage reduction is nice. And yes there are articles floating around, how bad it is to use LTO. Looking closely all issues arriving with enabling LTO are bad code design. |
@Jason2866 Is your recommendation to disable LTO for C3 for now? It looks like the build selects Espressif32 @ 5.3.0, which selects toolchain-riscv32-esp 8.4.0+2021r2-patch5. |
@deece Yes, disable LTO for the C3 when toolchain-riscv32-esp 8.4.0+2021r2-patch5 is used. |
I think the script that changes You could achieve it with a new section like this [Size_Flags]
build_flags =
-flto ;; enable LTO
build_unflags =
-fno-lto ;; remove no-LTO and then in custom buildenvs, we could simply add |
@deece ... so instead of modifying the |
For Tasmota we enable LTO global for alle ESP32x and disable it individually in the C3 envs. |
Thank you @Jason2866 for selflessly contributing your knowledge. ❤️ @softhack007 would it make sense to include LTO for every ESP32 except C3? We can make default @deece could you update BTW this is beyond my skill level so I cannot comment or review anything. |
@@ -0,0 +1,24 @@ | |||
Import('env') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a short comment here why this script exists and what it does on a high level (I assume it replaces some parts of the linker with another that supports LTO?)
Hi. I think it would be good to pursue this PR further as we are really at the limits of binary size for currently selected partitions. Unfortunately my knowledge is limited and I cannot be of any help here. @deece @softhack007 as you are more experienced could you work together (and if @Jason2866 can help) to make it happen? IMO as @Jason2866 recommended we should enable LTO across all ESP32 except C3 (which only has 1 build environment) and add |
I'm halfway through a revision, but have been sidetracked with more urgent duties. I'll get back to it soon
On 28 April 2024 5:49:24 pm AEST, "Blaž Kristan" ***@***.***> wrote:
Hi. I think it would be good to pursue this PR further as we are really at the limits of binary size for currently selected partitions.
Unfortunately my knowledge is limited and I cannot be of any help here.
@deece @softhack007 as you are more experienced could you work together (and if @Jason2866 can help) to make it happen?
IMO as @Jason2866 recommended we should enable LTO across all ESP32 except C3 (which only has 1 build environment) and add `esp32` build flags to optionally disable it as @softhack007 suggested.
--
Reply to this email directly or view it on GitHub:
#3824 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
Alastair D'Silva
mob: 0423 762 819
|
build_flags_esp8266 = ${common.build_flags} ${esp8266.build_flags} | ||
build_flags_esp32 = ${common.build_flags} ${esp32.build_flags} | ||
build_flags_esp32_V4= ${common.build_flags} ${esp32_idf_V4.build_flags} | ||
build_flags_esp32 = ${common.build_flags} ${esp32.build_flags} ${common.lto_flags} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are no longer present in current platformio.ini
so LTO flags should be moved into appropriate [esp32xx]
sections.
I would also create environment (or better just flags) for debug builds. Enabling all debug flags (as specified in [common]
section) will no longer link successfully when using anything but plain binary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the path I've been going down. I'll rebase against whatever the current development branch is when I get back to it.
Basically, every platform will have a normal (undecorated) and debug environment, with (LTO|None) or DEBUG flags applied as appropriate for the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this when making base environments:
[env:wemos_d1_mini32]
board = esp32dev
platform = ${esp32.platform}
;board_build.f_flash = 80000000L # does not work on every ESP
board_build.flash_mode = qio
board_build.partitions = ${esp32.default_partitions}
board_build.filesystem = littlefs
monitor_filters = esp32_exception_decoder
build_unflags = ${common.build_unflags}
build_flags = ${common.build_flags} ${esp32.build_flags} -D WLED_RELEASE_NAME=WEMOS32
; -D WLED_DISABLE_HUESYNC
; -D WLED_DISABLE_LOXONE
; -D WLED_DISABLE_ALEXA
-D WLED_DISABLE_BROWNOUT_DET
-D WLED_NTP_ENABLED=true
-D WLED_TIMEZONE=2
-D WLED_LAT=45.8
-D WLED_LON=15.17
-D LEDPIN=16
-D RLYPIN=19
-D BTNPIN=17
-D IRPIN=18
-D USERMOD_DALLASTEMPERATURE
-D TEMPERATURE_PIN=23
-D USERMOD_FOUR_LINE_DISPLAY
-D USERMOD_PIRSWITCH
-D PIR_SENSOR_PIN=26
-D PIR_SENSOR_OFF_SEC=60
-D PIR_SENSOR_MAX_SENSORS=2
-D USERMOD_MULTI_RELAY
-D MULTI_RELAY_MAX_RELAYS=3
-D USERMOD_AUDIOREACTIVE
-D DMTYPE=1 # 0-analog/disabled, 1-I2S generic, 2-ES7243, 3-SPH0645, 4-I2S+mclk, 5-I2S PDM
-UWLED_USE_MY_CONFIG
lib_deps = ${esp32.lib_deps}
olikraus/U8g2 # @~2.33.15
paulstoffregen/OneWire@~2.3.8
Wire
${esp32.AR_lib_deps}
[env:wemos_d1_mini32_debug]
extends = env:wemos_d1_mini32
extra_scripts = ${env.extra_scripts}
replace_fs.py
build_flags = ${common.build_flags} ${esp32.build_flags}
; -DBOARD_HAS_PSRAM -mfix-esp32-psram-cache-issue ;; this adds approximately 99k to the binary size
; DEBUG reserves GPIO1
-D WLED_DEBUG
; -D WLED_DEBUG_HOST='"192.168.x.x"'
; -D WLED_DEBUG_MATH
; -D WLED_USE_UNREAL_MATH
; -D WLED_USE_ETHERNET
; -D WLED_ETH_DEFAULT=5
; -D WLED_DISABLE_MQTT
; -D WLED_DISABLE_INFRARED
-D WLED_DISABLE_ALEXA
-D WLED_DISABLE_HUESYNC
-D WLED_DISABLE_LOXONE
; -D WLED_DISABLE_MODE_BLEND
; -D WLED_ENABLE_DMX
; -D WLED_ENABLE_PIXART
; -D WLED_DISABLE_PXMAGIC
; -D WLED_DISABLE_WEBSOCKETS
; -D WLED_DISABLE_2D
-D WLED_AP_SSID_UNIQUE
-D WLED_NTP_ENABLED=true
-D WLED_TIMEZONE=2
-D WLED_LAT=45.8
-D WLED_LON=15.17
; -D LEDPIN=16
-D DATA_PINS=16,3
-D PIXEL_COUNTS=30,30
-D RLYPIN=19
-D BTNPIN=17
-D IRPIN=18
; -D WLED_MAX_BUTTONS=12
; -D USERMOD_SHT
-D USERMOD_DALLASTEMPERATURE
-D TEMPERATURE_PIN=23
-D USERMOD_FOUR_LINE_DISPLAY
-D USERMOD_ROTARY_ENCODER_UI # 33,34,35
-D ENCODER_DT_PIN=33
-D ENCODER_CLK_PIN=34
-D ENCODER_SW_PIN=35
; -D USERMOD_AUTO_SAVE
; -D AUTOSAVE_AFTER_SEC=90
; -D USERMOD_ANIMATED_STAIRCASE
-D USERMOD_PIRSWITCH
-D PIR_SENSOR_PIN=36
-D PIR_SENSOR_OFF_SEC=60
-D PIR_SENSOR_MAX_SENSORS=2
-D USERMOD_MULTI_RELAY # 14, 15, 32
-D MULTI_RELAY_MAX_RELAYS=2
; -D MULTI_RELAY_PINS=14,15
; -D USERMOD_PWM_FAN
; -D USERMOD_BOBLIGHT
; -D USERMOD_POWER_AP
-D USERMOD_AUDIOREACTIVE
-D AUDIOPIN=-1 # analog pin
-D DMTYPE=1 # 0-analog/disabled, 1-I2S generic, 2-ES7243, 3-SPH0645, 4-I2S+mclk, 5-I2S PDM
-D I2S_SDPIN=4
-D I2S_WSPIN=32
-D I2S_CKPIN=12
; -D STATUSLED=2
-D WLED_WATCHDOG_TIMEOUT=10
-D WLED_DISABLE_BROWNOUT_DET
; -D ABL_MILLIAMPS_DEFAULT=0
-UWLED_USE_MY_CONFIG
These are my daily build environments for ESP32 that I often change wile testing WLED. I have similar environments for 8266, S2, S3 and C3, Ethernet, WROVER models and many exotic boards that I received for testing.
Hey! This pull request has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. |
ref: https://gcc.gnu.org/wiki/LinkTimeOptimization
Link Time Optimisation defers final compilation until the link stage, which allows more aggressive inlining and optimisations to occur, as more information is known at link time vs when each object is compiled.
For the esp32dev platform, I have the following size reductions:
Without LTO:
1313024 .pio/build/esp32dev/firmware.bin
With LTO:
1214256 .pio/build/esp32dev/firmware.bin