-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize the cost of write_fmt without arguments #75358
Conversation
Can I have a perf run? cc @Mark-Simulacrum |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 23fdb55 with merge 01463991259f5ad8ff520b94d9c9c2e72cdf1a98... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know this can be fixed without changing the compiler built-in format_args
. @lzutao if I want to allow format_args
to accept u8
or char
rather than &str
do I need to change the compiler code?
☀️ Try build successful - checks-actions, checks-azure |
Queued 01463991259f5ad8ff520b94d9c9c2e72cdf1a98 with parent 13290e8, future comparison URL. |
|
Finished benchmarking try commit (01463991259f5ad8ff520b94d9c9c2e72cdf1a98): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
I don't know how to interpret this perf result. It seems that most are regressed. |
I cannot reproduce your assembly with the try build's artifacts; I suspect that the difference you're seeing is likely an effect of local compilation options for std allowing more aggressive inlining or so than what we can do with dist-produced artifacts. use std::fmt::{self, Write as FmtWrite};
#[no_mangle]
pub fn bar(buf: &mut String) -> fmt::Result {
write!(buf, " ")
} gets me: before this PR: ; rustc +13290e83a6e20f3b408d177a9d64d8cf98fe4615 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
51e0: 48 83 ec 38 sub rsp,0x38
51e4: 48 89 3c 24 mov QWORD PTR [rsp],rdi
51e8: 48 8d 05 69 a8 03 00 lea rax,[rip+0x3a869] # 3fa58 <__do_global_dtors_aux_fini_array_entry+0x8>
51ef: 48 89 44 24 08 mov QWORD PTR [rsp+0x8],rax
51f4: 48 c7 44 24 10 01 00 mov QWORD PTR [rsp+0x10],0x1
51fb: 00 00
51fd: 48 c7 44 24 18 00 00 mov QWORD PTR [rsp+0x18],0x0
5204: 00 00
5206: 48 8d 05 fb dd 02 00 lea rax,[rip+0x2ddfb] # 33008 <_fini+0x454>
520d: 48 89 44 24 28 mov QWORD PTR [rsp+0x28],rax
5212: 48 c7 44 24 30 00 00 mov QWORD PTR [rsp+0x30],0x0
5219: 00 00
521b: 48 8d 35 9e a8 03 00 lea rsi,[rip+0x3a89e] # 3fac0 <anon.4574bf75806fea7c4a1d662dd29266f6.0.llvm.133549432485651853>
5222: 48 89 e7 mov rdi,rsp
5225: 48 8d 54 24 08 lea rdx,[rsp+0x8]
522a: ff 15 b0 c8 03 00 call QWORD PTR [rip+0x3c8b0] # 41ae0 <_GLOBAL_OFFSET_TABLE_+0x70>
5230: 48 83 c4 38 add rsp,0x38
5234: c3 ret
with this PR: ; rustc +01463991259f5ad8ff520b94d9c9c2e72cdf1a98 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
51e0: 41 56 push r14
51e2: 53 push rbx
51e3: 50 push rax
51e4: 49 89 fe mov r14,rdi
51e7: 48 8b 77 10 mov rsi,QWORD PTR [rdi+0x10]
51eb: ba 01 00 00 00 mov edx,0x1
51f0: e8 3b ff ff ff call 5130 <alloc::raw_vec::RawVec<T,A>::reserve>
51f5: 49 8b 5e 10 mov rbx,QWORD PTR [r14+0x10]
51f9: 49 8b 3e mov rdi,QWORD PTR [r14]
51fc: 48 01 df add rdi,rbx
51ff: 48 8d 15 fa dd 02 00 lea rdx,[rip+0x2ddfa] # 33000 <_fini+0x42c>
5206: be 01 00 00 00 mov esi,0x1
520b: b9 01 00 00 00 mov ecx,0x1
5210: e8 1b 00 00 00 call 5230 <core::slice::<impl [T]>::copy_from_slice>
5215: 48 83 c3 01 add rbx,0x1
5219: 49 89 5e 10 mov QWORD PTR [r14+0x10],rbx
521d: 31 c0 xor eax,eax
521f: 48 83 c4 08 add rsp,0x8
5223: 5b pop rbx
5224: 41 5e pop r14
5226: c3 ret |
@Mark-Simulacrum that seems to be even more aggressively inlined, i.e, |
Hm, I tried "try" build and master artifact by using rustup-toolchain-install-master: rustc -C debuginfo=1 -o ./foo-stage1.s --emit asm -Cllvm-args=--x86-asm-syntax=intel --crate-type rlib --color=always -C opt-level=3 -Z symbol-mangling-version=v0 ./foo.rs The source files are the same as in godbolt link. No no_mangle. |
write(&mut self, args) | ||
match args.as_str() { | ||
Some(s) => self.write_str(s), | ||
None => write(&mut self, args), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special casing should probably be done inside write
to cover more cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it may already be done there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that moving this logic to write
doesn't help inline calls.
Hi, I created a zulip topic to discuss about this change: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358 |
Probably try again when inline const implemented. |
This change could be very beneficial for embedded/no_std projects. Not necessarily for performance, but for code size. The discussion above was only about performance, where it seems to make little difference. However, code size, especially for no_std projects, is also very important. #78122 (comment) contains a simple benchmark program that could be used to check the difference this PR makes. |
Another data point: There are quite a few macros in the wild that use a special-case to avoid calling This change would make the special case there identical to the generic case, simplifying things. |
Thanks to
Arguments::as_str
, we should knowwhen we don't have any arguments at all.
Before: https://rust.godbolt.org/z/zKGEMP
After:
Closes #75301
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358