Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable function-sections for ARM Thumb targets #37580

Closed
wants to merge 1 commit into from
Closed

Enable function-sections for ARM Thumb targets #37580

wants to merge 1 commit into from

Conversation

JinShil
Copy link

@JinShil JinShil commented Nov 4, 2016

For the ARM thumb targets, I think users would want function-sections enabled more often then not, given the importance of code size for these targets.

Due to #26338 there doesn't appear to be a way to pass -function-sections and -data-sections on the command line for rustc.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @arielb1 (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@alexcrichton
Copy link
Member

@japaric do you remember if this was intentionally omitted, or perhaps just an accident?

@alexcrichton alexcrichton assigned alexcrichton and unassigned arielb1 Nov 4, 2016
@alexcrichton
Copy link
Member

Ah to clarify this looks good to me, thanks! Just want to understand if we intentionally omitted this or not.

@japaric
Copy link
Member

japaric commented Nov 4, 2016

do you remember if this was intentionally omitted, or perhaps just an accident?

It was never brought up during the RFC.

GCC says:

-ffunction-sections instructs gcc to place each function
(including static ones) in its own section named .text.function_name
instead of placing all functions in one big .text section.

But, AFAICT, this seems to already be the case today. As in each functions gets its own text.$name section. @JinShil Do you have any measurements of how this affect binary size?

@japaric
Copy link
Member

japaric commented Nov 4, 2016

TargetOptions says function_sections defaults to true:

    /// Emit each function in its own section. Defaults to true.
    pub function_sections: bool,

@alexcrichton
Copy link
Member

Ah, that would explain my confusion for why functions would today be in their own sections. In that sense, @JinShil I'm curious what prompted this PR? Were the function sections not working for you?

@JinShil
Copy link
Author

JinShil commented Nov 4, 2016

I'm seeing a lot of dead code in my binary after including formatted IO from core. Analyzing the binary with readelf seemed to indicate that function and data sections were not enabled. When looking through rust's source code, I didn't see that this option was enabled, so I was operating on that assumption. I'll do more analysis. Closing for now.

@JinShil JinShil closed this Nov 4, 2016
@japaric
Copy link
Member

japaric commented Nov 4, 2016

@JinShil How can you tell it's dead code? I'd like to know.

And I'd like to know why print! appears to optimize so badly.

Using formatting (e.g. print!) in my thumbv7em programs always adds at least 2KB+ of code (in release mode + LTO) and that code contains panic!s even though my write_str implementation always returns Ok...

I feel that the formatting machinery hasn't been optimized for code size. I'm going to blame the use trait objects and unwrap in its implementation as the reason why LLVM can't do a good job at optimizing it.

For the record, this is what println!("The answer is {}", 42i32) compiles to: gist. Whereas an empty main looks like this.

Some numbers:

  • empty main: .text is 94 bytes
  • write_str("The answer is"): .text is 142 bytes
  • println!("The answer is {}", 42): .text is 2695 bytes

NOTE: Everything (including core) was compiled with opt-level=3 + LTO. opt-level=s and opt-level=z make only a small difference: just ~200 bytes smaller at most.

Fun stuff:

# `print!` version
$ arm-none-eabi-strings target/thumbv7em-none-eabihf/release/examples/minimal
(..)
/home/japaric/.multirust/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/option.rs
(..)
_ZN38_$LT$core..option..Option$LT$T$GT$$GT$6unwrap14_MSG_FILE_LINE17hf959eadea8d19a29E
_ZN44_$LT$char$u20$as$u20$core..char..CharExt$GT$11encode_utf817h99675f40beaf5d14E.18
_ZN4core3fmt10ArgumentV110show_usize17hbc0554443e76be10E
_ZN4core3fmt10ArgumentV18as_usize17hc7abf8ebac35dab1E
_ZN4core3fmt3num52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h5e30a65f510dbc16E
_ZN4core3fmt3num54_$LT$impl$u20$core..fmt..Display$u20$for$u20$usize$GT$3fmt17h492841b4ae8a3ddbE
_ZN4core3fmt5Write9write_fmt17he77f7f8d8bac13b7E
_ZN4core3fmt9Formatter12pad_integral17h1d0f06fe3f526724E
_ZN4core3fmt9Formatter12pad_integral28_$u7b$$u7b$closure$u7d$$u7d$17hb969ffb178d0f093E
_ZN4core3fmt9Formatter8getcount17h3551a55b75a543faE
_ZN4core9panicking18panic_bounds_check17hf94f8fdf11bbac34E
_ZN4core9panicking5panic17hcce66e47f857f1efE
_ZN4core9panicking9panic_fmt17h4ce23f3f807dad84E
_ZN4drop17h744fdee9eefc80dbE
_ZN50_$LT$f3..itm..Port$u20$as$u20$core..fmt..Write$GT$9write_str17hb706f301a9056e3aE
_ZN96_$LT$core..fmt..Write..write_fmt..Adapter$LT$$u27$a$C$$u20$T$GT$$u20$as$u20$core..fmt..Write$GT$10write_char17h424f8d3efdcadcdcE
_ZN96_$LT$core..fmt..Write..write_fmt..Adapter$LT$$u27$a$C$$u20$T$GT$$u20$as$u20$core..fmt..Write$GT$9write_fmt17h0792f693349ffb8fE
_ZN96_$LT$core..fmt..Write..write_fmt..Adapter$LT$$u27$a$C$$u20$T$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17hf842a23f27888b72E

I have no idea why those strings are in there.

(Sorry for the rant)

@JinShil
Copy link
Author

JinShil commented Nov 4, 2016

@japaric Yes that's exactly what I'm seeing, which led me to blame the lack of an explicit 'function_sections =true' in the target definition. I would expect those strings to be put in their own section and removed with '--gc-sections'.

@alexcrichton
Copy link
Member

@japaric FWIW println has quite a bit of machinery:

  • A globally cached stdio output stream
  • line-buffered writes to this output buffer
  • error handling, notably println panicking if errors happen
  • panics bring in panic hooks, backtraces, ...
  • string formatting for the actual writing
  • implementation of all the options for formatting (alignment, ...)
  • unicode tables for some formatting pulled in

The core::fmt module was optimized mostly for consumer code size rather than code size itself. That is, trait objects were used liberally to ensure that usage of format_args! doesn't explode code size. That being said I'm sure it's ripe for being optimized. I'd be pretty certain there are a bunch of code paths that think they panic when they don't actually or could be optimized away statically or something like that.

That's at least what's happening!

@JinShil
Copy link
Author

JinShil commented Nov 4, 2016

@japaric But I guess the point you're making is that the code isn't actually dead. That may be true. I'm on the road for the next 48 hours, but when I get back I'll do a more thorough analysis. I obviously should have looked a little deeper before submitting thus PR.

@japaric
Copy link
Member

japaric commented Nov 5, 2016

@JinShil

I would expect those strings to be put in their own section and removed with '--gc-sections'.

Could you say more about this? Are these strings not supposed to be in the .text or something like that? (A reference to how the linker is supposed to behave in this case would be nice but that would be asking too much I think 😄)

I'm wondering if I'm doing something silly in my linker script that's preventing this stuff from being GCed away. I'll double check.

I'm on the road for the next 48 hours

Have a safe trip!

I get back I'll do a more thorough analysis.

👍


@alexcrichton I should have been more explicit. This is bare metal / no_std code. I'm not using std's println because I can't instead I'm using my implementation which looks pretty much the same minus the thread local / locking stuff. Here's my implementation of the macro for reference and my write_str function.

A globally cached stdio output stream
line-buffered writes to this output buffer

These are not present in my example.

error handling, notably println panicking if errors happen

I'm not unwrapping in my macro but simply ignoring the error because my write_str always returns Ok.

panics bring in panic hooks, backtraces, ...

This is part I find odd. I'm compiling everything (including core) with panic = abort and all those function names look like they are there for printing back a backtrace but there's no backtrace to print in this case. Are those function names used for something else? Because otherwise it seems that they shouldn't be emitted by rustc in the first place.

That is, trait objects were used liberally to ensure that usage of format_args! doesn't explode code size.

Right, but I'm wondering how much optimizations trait objects are actually preventing; it seems they are forcing the compiler to not use inlining at all so the compiler "can't see" that the error path is never used in this particular case and drop branches, etc. There are probably tradeoffs in larger code bases that use different kinds of formatting. Some numbers on trait objects vs generics would be nice, specially now that we have opt-levels s and z.

@japaric
Copy link
Member

japaric commented Nov 5, 2016

Are those function names used for something else?

So after looking carefully, those function names that appear in the strings output seem to only be part of the debug info of the ELF file. Because when I turn the ELF into a raw binary file (objcopy -O binary) then none of those names are present:

$ arm-none-eabi-objcopy -O binary target/thumbv7em-none-eabihf/release/examples/minimal minimal.bin

ls -l minimal.bin
-rwxr-xr-x 1 japaric japaric 2340 minimal.bin

$ arm-none-eabi-strings minimal.bin
pGpGZ
FjFO
|@aF
5UE
5EE
XF=F
;ZEO
[       d
03jF
[       d
*The answer is
00010203040506070809101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899

What's left to test is (a) check if we can eliminate some panic!s/unwraps from core::fmt implementation and see if that helps and (b) rewrite core::fmt to use generics instead of traits objects and see how much that helps.

@JinShil
Copy link
Author

JinShil commented Nov 5, 2016

Could you say more about this? Are these strings not supposed to be in the .text or something like that? (A reference to how the linker is supposed to behave in this case would be nice but that would be asking too much I think)

That's actually due to the -data-sections option. See the entry for -fdata-sections here. According to this, setting function_sections also means setting data_sections. So, assuming LLVM's -data-sections option works like GCC's -fdata-sections those strings should also go into their own sections. Then, if they are dead (nothing is refering to them), they should be stripped out by --gc-sections. That's my understanding anyway.

@JinShil
Copy link
Author

JinShil commented Nov 5, 2016

I was able to get back to this today. The following code...

// compile with --emit obj and analyize with arm-none-eabi-objdump -s object_file.0.o

#![feature(no_core)]
#![feature(lang_items)]

#![no_core]
#![no_main]

//---------- Minimal Runtime ------------
#[lang = "sized"]
pub trait Sized { }

#[lang = "sync"]
pub trait Sync { }

#[lang = "copy"]
pub trait Copy { }

impl Sync for &'static str { }

//---------- Program Code ---------------
#[allow(dead_code)]
static READ_ONLY_STRING: &'static str = "This is a readonly string";

#[no_mangle]
#[allow(dead_code)]
pub fn function1() { }

#[no_mangle]
#[allow(dead_code)]
pub fn function2() { }

... produces the following object file.

Contents of section .text._ZN4drop17hcc6490d816c47e50E:
 0000 7047                                 pG              
Contents of section .ARM.exidx.text._ZN4drop17hcc6490d816c47e50E:
 0000 00000000 b0b0b080                    ........        
Contents of section .text.function1:
 0000 81b0ffe7 01b07047                    ......pG        
Contents of section .ARM.exidx.text.function1:
 0000 00000000 b0b00080                    ........        
Contents of section .text.function2:
 0000 81b0ffe7 01b07047                    ......pG        
Contents of section .ARM.exidx.text.function2:
 0000 00000000 b0b00080                    ........        
Contents of section .rodata._ZN13sections_test16READ_ONLY_STRING17h486a42199fa52f40E:
 0000 00000000 19000000                    ........        
Contents of section .rodata.str.0:
 0000 54686973 20697320 61207265 61646f6e  This is a readon
 0010 6c792073 7472696e 67                 ly string 

... some additional debug stuff...

So, it does look like the compiler is doing the right thing. You can see function1, function2, and READ_ONLY_STRING each in their own section. Looking at the object files generated with --emit obj in my real project (the one that prompted this PR), I see the same characteristics: functions and data in their own sections.

However, after linking, the final binary contains only a .text and a .rodata section.

Contents of section .text:
 8000000 00000110 53010008 80b58eb0 40f62060  ....S.......@. `
 8000010 0baac0f6 00000b90 2b200c90 40f6a460  ........+ ..@..`
 8000020 c0f60000 d0e90010 019240f2 6502c0f6  [email protected]...
 ... lots more ...

Contents of section .rodata:
 8000e20 63616c6c 65642060 52657375 6c743a3a  called `Result::
 8000e30 756e7772 61702829 60206f6e 20616e20  unwrap()` on an 
 8000e40 60457272 60207661 6c756500 63000008  `Err` value.c...
 ... lots more ...

It was this result that led me to believe that -function-sections was not enabled, or not working. I guess the linker merges any remaining sections when it generates the final binary. I don't recall seeing such merging behavior in my C/C++ work from a couple of years ago (perhaps that is a new feature of arm-none-eabi-ld? I'm using GNU ld (GNU Binutils) 2.27 from Arch Linux)

TL;DR
It appears, by analyzing the intermediate files generated with --emit obj, that -function-sections and -data-sections are enabled and working properly in the thumb target definitions. I suspect the code bloat introduced with formatted io from core isn't actually dead code from the linker's perspective; rather, as alluded to by @japaric , it's just code that isn't written in a way that can be optimized for code size with our existing tools.

@alexcrichton
Copy link
Member

@japaric

These are not present in my example.

Aha! Disregard me then!

but there's no backtrace to print in this case

Oh we print backtraces for both panic=abort and panic=unwind, compiling with panic=abort just removes the unwinding infrastructure for panics, not the backtrace portion.

I'm wondering how much optimizations trait objects are actually preventing

It's true, yeah, and I'd love to dive more into the core::fmt module. In theory literally nothing there ever panics, as that'd be a bug in the module itself. Coercing LLVM to prove that sometimes, however, may be difficult. For example slicing which we know to never panic isn't always too convincing to LLVM. "fixing" core::fmt may require some minor usage of unsafe as well perhaps.

What's left to test is (a) check if we can eliminate some panic!s/unwraps from core::fmt implementation and see if that helps and (b) rewrite core::fmt to use generics instead of traits objects and see how much that helps.

I'd recommend (a) first as I suspect that will get us 99% of the way there. Option (b) may not be backwards compatible, unfortunately.

@japaric
Copy link
Member

japaric commented Nov 8, 2016

I've open rust-embedded/wg#18 to work on this out of tree. Possibly experimenting with an implementation (different fmt traits?) that don't involve trait objects just to get an idea of how much it helps (if at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants