Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimize the cost of write_fmt without arguments #75358

Closed
wants to merge 1 commit into from

Conversation

tesuji
Copy link
Contributor

@tesuji tesuji commented Aug 10, 2020

Thanks to Arguments::as_str, we should know
when we don't have any arguments at all.

Before: https://rust.godbolt.org/z/zKGEMP
After:

check_stage1::foo:
 push    rbx
 mov     rax, rsi
 mov     rbx, rdi
 lea     rsi, [rip, +, .L__unnamed_1]
 mov     edx, 1
 mov     rdi, rax
 call    alloc::vec::Vec<T>::extend_from_slice
 mov     byte, ptr, [rbx], 3
 mov     rax, rbx
 pop     rbx
 ret
check_stage1::bar:
 push    rax
 lea     rsi, [rip, +, .L__unnamed_1]
 mov     edx, 1
 call    alloc::vec::Vec<T>::extend_from_slice
 xor     eax, eax
 pop     rcx
 ret

Closes #75301

Zulip discussion: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 10, 2020
@tesuji
Copy link
Contributor Author

tesuji commented Aug 10, 2020

Can I have a perf run? cc @Mark-Simulacrum

@lcnr
Copy link
Contributor

lcnr commented Aug 10, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Aug 10, 2020

⌛ Trying commit 23fdb55 with merge 01463991259f5ad8ff520b94d9c9c2e72cdf1a98...

Copy link
Contributor

@pickfire pickfire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I didn't know this can be fixed without changing the compiler built-in format_args. @lzutao if I want to allow format_args to accept u8 or char rather than &str do I need to change the compiler code?

@bors
Copy link
Contributor

bors commented Aug 10, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 01463991259f5ad8ff520b94d9c9c2e72cdf1a98 (01463991259f5ad8ff520b94d9c9c2e72cdf1a98)

@rust-timer
Copy link
Collaborator

Queued 01463991259f5ad8ff520b94d9c9c2e72cdf1a98 with parent 13290e8, future comparison URL.

@tesuji
Copy link
Contributor Author

tesuji commented Aug 10, 2020

format_args always need a static format string, I don't know if we want it to accept u8 or char.
I don't know much about this. You could ask more in Zulip T-compiler.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (01463991259f5ad8ff520b94d9c9c2e72cdf1a98): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@tesuji
Copy link
Contributor Author

tesuji commented Aug 10, 2020

I don't know how to interpret this perf result. It seems that most are regressed.
Leaving it to reviewer.

@Mark-Simulacrum
Copy link
Member

I cannot reproduce your assembly with the try build's artifacts; I suspect that the difference you're seeing is likely an effect of local compilation options for std allowing more aggressive inlining or so than what we can do with dist-produced artifacts.

use std::fmt::{self, Write as FmtWrite};

#[no_mangle]
pub fn bar(buf: &mut String) -> fmt::Result {
    write!(buf, " ")
}

gets me:

before this PR:

; rustc +13290e83a6e20f3b408d177a9d64d8cf98fe4615 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
    51e0:       48 83 ec 38             sub    rsp,0x38
    51e4:       48 89 3c 24             mov    QWORD PTR [rsp],rdi
    51e8:       48 8d 05 69 a8 03 00    lea    rax,[rip+0x3a869]        # 3fa58 <__do_global_dtors_aux_fini_array_entry+0x8>
    51ef:       48 89 44 24 08          mov    QWORD PTR [rsp+0x8],rax
    51f4:       48 c7 44 24 10 01 00    mov    QWORD PTR [rsp+0x10],0x1
    51fb:       00 00
    51fd:       48 c7 44 24 18 00 00    mov    QWORD PTR [rsp+0x18],0x0
    5204:       00 00
    5206:       48 8d 05 fb dd 02 00    lea    rax,[rip+0x2ddfb]        # 33008 <_fini+0x454>
    520d:       48 89 44 24 28          mov    QWORD PTR [rsp+0x28],rax
    5212:       48 c7 44 24 30 00 00    mov    QWORD PTR [rsp+0x30],0x0
    5219:       00 00
    521b:       48 8d 35 9e a8 03 00    lea    rsi,[rip+0x3a89e]        # 3fac0 <anon.4574bf75806fea7c4a1d662dd29266f6.0.llvm.133549432485651853>
    5222:       48 89 e7                mov    rdi,rsp
    5225:       48 8d 54 24 08          lea    rdx,[rsp+0x8]
    522a:       ff 15 b0 c8 03 00       call   QWORD PTR [rip+0x3c8b0]        # 41ae0 <_GLOBAL_OFFSET_TABLE_+0x70>
    5230:       48 83 c4 38             add    rsp,0x38
    5234:       c3                      ret

with this PR:

; rustc +01463991259f5ad8ff520b94d9c9c2e72cdf1a98 -Copt-level=3 --crate-type=cdylib fmt.rs
; objdump -Mintel --disassemble=bar ./libfmt.so | rustfilt
    51e0:       41 56                   push   r14
    51e2:       53                      push   rbx
    51e3:       50                      push   rax
    51e4:       49 89 fe                mov    r14,rdi
    51e7:       48 8b 77 10             mov    rsi,QWORD PTR [rdi+0x10]
    51eb:       ba 01 00 00 00          mov    edx,0x1
    51f0:       e8 3b ff ff ff          call   5130 <alloc::raw_vec::RawVec<T,A>::reserve>
    51f5:       49 8b 5e 10             mov    rbx,QWORD PTR [r14+0x10]
    51f9:       49 8b 3e                mov    rdi,QWORD PTR [r14]
    51fc:       48 01 df                add    rdi,rbx
    51ff:       48 8d 15 fa dd 02 00    lea    rdx,[rip+0x2ddfa]        # 33000 <_fini+0x42c>
    5206:       be 01 00 00 00          mov    esi,0x1
    520b:       b9 01 00 00 00          mov    ecx,0x1
    5210:       e8 1b 00 00 00          call   5230 <core::slice::<impl [T]>::copy_from_slice>
    5215:       48 83 c3 01             add    rbx,0x1
    5219:       49 89 5e 10             mov    QWORD PTR [r14+0x10],rbx
    521d:       31 c0                   xor    eax,eax
    521f:       48 83 c4 08             add    rsp,0x8
    5223:       5b                      pop    rbx
    5224:       41 5e                   pop    r14
    5226:       c3                      ret

@tmiasko
Copy link
Contributor

tmiasko commented Aug 10, 2020

@Mark-Simulacrum that seems to be even more aggressively inlined, i.e, alloc::vec::Vec<T>::extend_from_slice has been inlined to reserve and copy_from_slice.

@tesuji
Copy link
Contributor Author

tesuji commented Aug 10, 2020

Hm, I tried "try" build and master artifact by using rustup-toolchain-install-master:
https://gist.github.com/lzutao/7f3cbfb331ec42ea6b1c83ebe7bd363d
Rustc is invoked like this:

rustc -C debuginfo=1 -o ./foo-stage1.s --emit asm -Cllvm-args=--x86-asm-syntax=intel --crate-type rlib --color=always -C opt-level=3 -Z symbol-mangling-version=v0 ./foo.rs

The source files are the same as in godbolt link. No no_mangle.

write(&mut self, args)
match args.as_str() {
Some(s) => self.write_str(s),
None => write(&mut self, args),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special casing should probably be done inside write to cover more cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it may already be done there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that moving this logic to write doesn't help inline calls.

@tesuji
Copy link
Contributor Author

tesuji commented Aug 11, 2020

Hi, I created a zulip topic to discuss about this change: https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Discuss.20changes.20in.20.20.2375358

@tesuji tesuji closed this Aug 24, 2020
@tesuji tesuji deleted the reduce-io-write branch August 24, 2020 12:26
@tesuji
Copy link
Contributor Author

tesuji commented Aug 24, 2020

Probably try again when inline const implemented.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 25, 2020

This change could be very beneficial for embedded/no_std projects. Not necessarily for performance, but for code size. write!(.., "literal") can pull in lots of formatting code through fmt::write. With this change, that's all gone.

The discussion above was only about performance, where it seems to make little difference. However, code size, especially for no_std projects, is also very important.

#78122 (comment) contains a simple benchmark program that could be used to check the difference this PR makes.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 25, 2020

Another data point: There are quite a few macros in the wild that use a special-case to avoid calling write_fmt when write_str would do: E.g. https://github.com/bobbin-rs/bobbin-sdk/blob/bcaf01f3724d9927c3d437a484aa4ef00072eb2e/lib/bobbin-sys/src/macros.rs#L3-L10

This change would make the special case there identical to the generic case, simplifying things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve codegen for single argument format_args!