Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on nightly (when using Cursor::read_exact and Byteorder) #47321

Closed
pgkos opened this issue Jan 10, 2018 · 9 comments
Closed
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@pgkos
Copy link

pgkos commented Jan 10, 2018

The Rust compiler on nightly nightly-x86_64-unknown-linux-gnu rustc 1.25.0-nightly (61452e506 2018-01-09) outputs suboptimal code (when compiling using cargo build --release) for the following example - it does not inline Cursor::read_exact:

extern crate byteorder;

use std::io::Cursor;
use byteorder::{ByteOrder, ReadBytesExt, WriteBytesExt, BigEndian, LittleEndian};

fn main() {
    let buf: [u8; 4] = [1, 2, 3, 4];
    let mut cur = Cursor::new(&buf);

    println!("{}", cur.read_u8().unwrap());
    println!("{}", cur.read_u8().unwrap());
    println!("{}", cur.read_u8().unwrap());
    println!("{}", cur.read_u8().unwrap());
}

It outputs the following x86_64 code:

0x00007370      4157           push r15
0x00007372      4156           push r14
0x00007374      4155           push r13
0x00007376      4154           push r12
0x00007378      53             push rbx
0x00007379      4883ec70       sub rsp, 0x70
0x0000737d      c744245c0102.  mov dword [local_5ch], 0x4030201
0x00007385      488d44245c     lea rax, qword [local_5ch]
0x0000738a      4889442460     mov qword [local_60h], rax
0x0000738f      48c744246800.  mov qword [local_68h], 0
0x00007398      c644240f00     mov byte [local_fh], 0
0x0000739d      4c8d742410     lea r14, qword [local_10h]
0x000073a2      488d742460     lea rsi, qword [local_60h]
0x000073a7      488d5c240f     lea rbx, qword [local_fh]
0x000073ac      b901000000     mov ecx, 1
0x000073b1      4c89f7         mov rdi, r14
0x000073b4      4889da         mov rdx, rbx
0x000073b7      e834040000     call sym._std::io::cursor::Cursor_T__as_std::io::Read_::read_exact::h1287815a4700b069
0x000073bc      807c241003     cmp byte [local_10h], 3
0x000073c1      0f85c4010000   jne 0x758b
0x000073c7      8a44240f       mov al, byte [local_fh]
0x000073cb      8844240f       mov byte [local_fh], al
0x000073cf      48895c2410     mov qword [local_10h], rbx
0x000073d4      4c8d3d15b404.  lea r15, qword sym.core::fmt::num::__impl_core::fmt::Display_for_u8_::fmt::h4ec6712cb0a2082e
0x000073db      4c897c2418     mov qword [local_18h], r15
0x000073e0      4c8d25597026.  lea r12, qword 0x0026e440
0x000073e7      4c89642428     mov qword [local_28h], r12
0x000073ec      48c744243002.  mov qword [local_30h], 2
0x000073f5      4c8d2d740105.  lea r13, qword 0x00057570
0x000073fc      4c896c2438     mov qword [local_38h], r13
0x00007401      48c744244001.  mov qword [local_40h], 1
0x0000740a      4c89742448     mov qword [local_48h], r14
0x0000740f      48c744245001.  mov qword [local_50h], 1
0x00007418      488d7c2428     lea rdi, qword [local_28h]
0x0000741d      e8be8a0000     call sym.std::io::stdio::_print::h7a1dc186f4ac9af9
0x00007422      c644240f00     mov byte [local_fh], 0
0x00007427      4c8d742410     lea r14, qword [local_10h]
0x0000742c      488d742460     lea rsi, qword [local_60h]
0x00007431      488d5c240f     lea rbx, qword [local_fh]
0x00007436      b901000000     mov ecx, 1
0x0000743b      4c89f7         mov rdi, r14
0x0000743e      4889da         mov rdx, rbx
0x00007441      e8aa030000     call sym._std::io::cursor::Cursor_T__as_std::io::Read_::read_exact::h1287815a4700b069
0x00007446      807c241003     cmp byte [local_10h], 3
0x0000744b      0f853a010000   jne 0x758b
0x00007451      8a44240f       mov al, byte [local_fh]
0x00007455      8844240f       mov byte [local_fh], al
0x00007459      48895c2410     mov qword [local_10h], rbx
0x0000745e      4c897c2418     mov qword [local_18h], r15
0x00007463      4c89642428     mov qword [local_28h], r12
0x00007468      48c744243002.  mov qword [local_30h], 2
0x00007471      4c896c2438     mov qword [local_38h], r13
0x00007476      48c744244001.  mov qword [local_40h], 1
0x0000747f      4c89742448     mov qword [local_48h], r14
0x00007484      48c744245001.  mov qword [local_50h], 1
0x0000748d      488d7c2428     lea rdi, qword [local_28h]
0x00007492      e8498a0000     call sym.std::io::stdio::_print::h7a1dc186f4ac9af9
0x00007497      c644240f00     mov byte [local_fh], 0
0x0000749c      4c8d742410     lea r14, qword [local_10h]
0x000074a1      488d742460     lea rsi, qword [local_60h]
0x000074a6      488d5c240f     lea rbx, qword [local_fh]
0x000074ab      b901000000     mov ecx, 1
0x000074b0      4c89f7         mov rdi, r14
0x000074b3      4889da         mov rdx, rbx
0x000074b6      e835030000     call sym._std::io::cursor::Cursor_T__as_std::io::Read_::read_exact::h1287815a4700b069
0x000074bb      807c241003     cmp byte [local_10h], 3
0x000074c0      0f85c5000000   jne 0x758b
0x000074c6      8a44240f       mov al, byte [local_fh]
0x000074ca      8844240f       mov byte [local_fh], al
0x000074ce      48895c2410     mov qword [local_10h], rbx
0x000074d3      4c897c2418     mov qword [local_18h], r15
0x000074d8      4c89642428     mov qword [local_28h], r12
0x000074dd      48c744243002.  mov qword [local_30h], 2
0x000074e6      4c896c2438     mov qword [local_38h], r13
0x000074eb      48c744244001.  mov qword [local_40h], 1
0x000074f4      4c89742448     mov qword [local_48h], r14
0x000074f9      48c744245001.  mov qword [local_50h], 1
0x00007502      488d7c2428     lea rdi, qword [local_28h]
0x00007507      e8d4890000     call sym.std::io::stdio::_print::h7a1dc186f4ac9af9
0x0000750c      c644240f00     mov byte [local_fh], 0
0x00007511      4c8d742410     lea r14, qword [local_10h]
0x00007516      488d742460     lea rsi, qword [local_60h]
0x0000751b      488d5c240f     lea rbx, qword [local_fh]
0x00007520      b901000000     mov ecx, 1
0x00007525      4c89f7         mov rdi, r14
0x00007528      4889da         mov rdx, rbx
0x0000752b      e8c0020000     call sym._std::io::cursor::Cursor_T__as_std::io::Read_::read_exact::h1287815a4700b069
0x00007530      807c241003     cmp byte [local_10h], 3
0x00007535      7554           jne 0x758b
0x00007537      8a44240f       mov al, byte [local_fh]
0x0000753b      8844240f       mov byte [local_fh], al
0x0000753f      48895c2410     mov qword [local_10h], rbx
0x00007544      4c897c2418     mov qword [local_18h], r15
0x00007549      4c89642428     mov qword [local_28h], r12
0x0000754e      48c744243002.  mov qword [local_30h], 2
0x00007557      4c896c2438     mov qword [local_38h], r13
0x0000755c      48c744244001.  mov qword [local_40h], 1
0x00007565      4c89742448     mov qword [local_48h], r14
0x0000756a      48c744245001.  mov qword [local_50h], 1
0x00007573      488d7c2428     lea rdi, qword [local_28h]
0x00007578      e863890000     call sym.std::io::stdio::_print::h7a1dc186f4ac9af9
0x0000757d      4883c470       add rsp, 0x70
0x00007581      5b             pop rbx
0x00007582      415c           pop r12
0x00007584      415d           pop r13
0x00007586      415e           pop r14
0x00007588      415f           pop r15
0x0000758a      c3             ret

On stable (stable-x86_64-unknown-linux-gnu rustc 1.23.0 (766bd11c8 2018-01-01)), it outputs:

0x00007130      55             push rbp
0x00007131      4889e5         mov rbp, rsp
0x00007134      4157           push r15
0x00007136      4156           push r14
0x00007138      4155           push r13
0x0000713a      4154           push r12
0x0000713c      53             push rbx
0x0000713d      4883ec48       sub rsp, 0x48
0x00007141      c745d3010203.  mov dword [local_2dh], 0x4030201
0x00007148      c645d701       mov byte [local_29h], 1
0x0000714c      4c8d75d7       lea r14, qword [local_29h]
0x00007150      4c897590       mov qword [local_70h], r14
0x00007154      4c8d3df5d804.  lea r15, qword sym.core::fmt::num::__impl_core::fmt::Display_for_u8_::fmt::h34a102a58af0af3e
0x0000715b      4c897d98       mov qword [local_68h], r15
0x0000715f      4c8d250a4426.  lea r12, qword obj.ref.j
0x00007166      4c8965a0       mov qword [local_60h], r12
0x0000716a      48c745a80200.  mov qword [local_58h], 2
0x00007172      4c8d2d8fef04.  lea r13, qword obj.ref.k
0x00007179      4c896db0       mov qword [local_50h], r13
0x0000717d      48c745b80100.  mov qword [local_48h], 1
0x00007185      488d5d90       lea rbx, qword [local_70h]
0x00007189      48895dc0       mov qword [local_40h], rbx
0x0000718d      48c745c80100.  mov qword [local_38h], 1
0x00007195      488d7da0       lea rdi, qword [local_60h]
0x00007199      e8a2560000     call sym.std::io::stdio::_print::h0e1f1f38819db7ba
0x0000719e      c645d702       mov byte [local_29h], 2
0x000071a2      4c897590       mov qword [local_70h], r14
0x000071a6      4c897d98       mov qword [local_68h], r15
0x000071aa      4c8965a0       mov qword [local_60h], r12
0x000071ae      48c745a80200.  mov qword [local_58h], 2
0x000071b6      4c896db0       mov qword [local_50h], r13
0x000071ba      48c745b80100.  mov qword [local_48h], 1
0x000071c2      48895dc0       mov qword [local_40h], rbx
0x000071c6      48c745c80100.  mov qword [local_38h], 1
0x000071ce      488d7da0       lea rdi, qword [local_60h]
0x000071d2      e869560000     call sym.std::io::stdio::_print::h0e1f1f38819db7ba
0x000071d7      8a45d5         mov al, byte [local_2bh]
0x000071da      8845d7         mov byte [local_29h], al
0x000071dd      4c897590       mov qword [local_70h], r14
0x000071e1      4c897d98       mov qword [local_68h], r15
0x000071e5      4c8965a0       mov qword [local_60h], r12
0x000071e9      48c745a80200.  mov qword [local_58h], 2
0x000071f1      4c896db0       mov qword [local_50h], r13
0x000071f5      48c745b80100.  mov qword [local_48h], 1
0x000071fd      48895dc0       mov qword [local_40h], rbx
0x00007201      48c745c80100.  mov qword [local_38h], 1
0x00007209      488d7da0       lea rdi, qword [local_60h]
0x0000720d      e82e560000     call sym.std::io::stdio::_print::h0e1f1f38819db7ba
0x00007212      8a45d6         mov al, byte [local_2ah]
0x00007215      8845d7         mov byte [local_29h], al
0x00007218      4c897590       mov qword [local_70h], r14
0x0000721c      4c897d98       mov qword [local_68h], r15
0x00007220      4c8965a0       mov qword [local_60h], r12
0x00007224      48c745a80200.  mov qword [local_58h], 2
0x0000722c      4c896db0       mov qword [local_50h], r13
0x00007230      48c745b80100.  mov qword [local_48h], 1
0x00007238      48895dc0       mov qword [local_40h], rbx
0x0000723c      48c745c80100.  mov qword [local_38h], 1
0x00007244      488d7da0       lea rdi, qword [local_60h]
0x00007248      e8f3550000     call sym.std::io::stdio::_print::h0e1f1f38819db7ba
0x0000724d      4883c448       add rsp, 0x48
0x00007251      5b             pop rbx
0x00007252      415c           pop r12
0x00007254      415d           pop r13
0x00007256      415e           pop r14
0x00007258      415f           pop r15
0x0000725a      5d             pop rbp
0x0000725b      c3             ret
@kennytm kennytm added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 10, 2018
@alexcrichton
Copy link
Member

I believe this is a result of #46910 where LLVM's making a different inlining decision in the ThinLTO phases than it previously did when everything was in one codegen unit. That PR was merged with prior knowledge that it would not yield a 100% across-the-board improvement in all cases. That being said, @pgkos did this slow down a larger application perhaps? If so do you have a more macro-like benchmark to poke around with?

@alexcrichton alexcrichton added the regression-from-stable-to-beta Performance or correctness regression from stable to beta. label Jan 11, 2018
@pgkos
Copy link
Author

pgkos commented Jan 11, 2018

@alexcrichton yes, larger apps are also affected.

A simple benchmark:

extern crate byteorder;

use std::io::Cursor;
use byteorder::{ByteOrder, ReadBytesExt, WriteBytesExt, BigEndian, LittleEndian};

fn main() {
    const LEN: usize = 100000000;
    let mut buf: Vec<u8> = Vec::with_capacity(LEN);
    unsafe { buf.set_len(LEN); };

    let mut cur = Cursor::new(&buf);

    let mut b = 0u8;
    loop {
        if let Ok(n) = cur.read_u8() {
            b += n;
        } else {
            break;
        }
    }
    println!("{}", b);
}

On stable:

real    0m0,359s
user    0m0,359s
sys     0m0,000s

On nightly:

real    0m1,286s
user    0m1,283s
sys     0m0,003s

@alexcrichton
Copy link
Member

@pgkos er yeah for any decision where the compiler says "I won't inline this" a benchmark can be crafted to show why it was a bad decision to do that for performance, but I'm curious if you saw this in a real-world application at some point? In that did something slow down to trigger this?

A solution could be to just tag the relevant method here #[inline] to give LLVM more of a nudge to inline it, but I'd be curious to learn more about the history here first.

@alexcrichton
Copy link
Member

This was discussed during libs triage and the conclusion was that we're more than willing to have a PR for an inline annotation if necessary here!

@alexcrichton alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Mar 15, 2018
@XAMPPRocky XAMPPRocky added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Apr 10, 2018
@dtolnay dtolnay removed this from the 1.24 milestone Jun 16, 2020
@wecing
Copy link
Contributor

wecing commented Dec 22, 2020

Is this issue already fixed? Here is the asm output I got using cargo rustc --release -- --emit asm:

	.section	.text._ZN6cursor4main17h6a4e86eb354fd049E,"ax",@progbits
	.p2align	4, 0x90
	.type	_ZN6cursor4main17h6a4e86eb354fd049E,@function
_ZN6cursor4main17h6a4e86eb354fd049E:
	.cfi_startproc
	pushq	%r15
	.cfi_def_cfa_offset 16
	pushq	%r14
	.cfi_def_cfa_offset 24
	pushq	%r13
	.cfi_def_cfa_offset 32
	pushq	%r12
	.cfi_def_cfa_offset 40
	pushq	%rbx
	.cfi_def_cfa_offset 48
	subq	$80, %rsp
	.cfi_def_cfa_offset 128
	.cfi_offset %rbx, -48
	.cfi_offset %r12, -40
	.cfi_offset %r13, -32
	.cfi_offset %r14, -24
	.cfi_offset %r15, -16
	movb	$1, 15(%rsp)
	leaq	15(%rsp), %r14
	movq	%r14, 16(%rsp)
	movq	_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hd2673ece5df91901E@GOTPCREL(%rip), %r15
	movq	%r15, 24(%rsp)
	leaq	.L__unnamed_2(%rip), %r13
	movq	%r13, 32(%rsp)
	movq	$2, 40(%rsp)
	movq	$0, 48(%rsp)
	leaq	16(%rsp), %rbx
	movq	%rbx, 64(%rsp)
	movq	$1, 72(%rsp)
	movq	_ZN3std2io5stdio6_print17h0d31d4b9faa6e1ecE@GOTPCREL(%rip), %r12
	leaq	32(%rsp), %rdi
	callq	*%r12
	movb	$2, 15(%rsp)
	movq	%r14, 16(%rsp)
	movq	%r15, 24(%rsp)
	movq	%r13, 32(%rsp)
	movq	$2, 40(%rsp)
	movq	$0, 48(%rsp)
	movq	%rbx, 64(%rsp)
	movq	$1, 72(%rsp)
	leaq	32(%rsp), %rdi
	callq	*%r12
	movb	$3, 15(%rsp)
	movq	%r14, 16(%rsp)
	movq	%r15, 24(%rsp)
	movq	%r13, 32(%rsp)
	movq	$2, 40(%rsp)
	movq	$0, 48(%rsp)
	movq	%rbx, 64(%rsp)
	movq	$1, 72(%rsp)
	leaq	32(%rsp), %rdi
	callq	*%r12
	movb	$4, 15(%rsp)
	movq	%r14, 16(%rsp)
	movq	%r15, 24(%rsp)
	movq	%r13, 32(%rsp)
	movq	$2, 40(%rsp)
	movq	$0, 48(%rsp)
	movq	%rbx, 64(%rsp)
	movq	$1, 72(%rsp)
	leaq	32(%rsp), %rdi
	callq	*%r12
	addq	$80, %rsp
	.cfi_def_cfa_offset 48
	popq	%rbx
	.cfi_def_cfa_offset 40
	popq	%r12
	.cfi_def_cfa_offset 32
	popq	%r13
	.cfi_def_cfa_offset 24
	popq	%r14
	.cfi_def_cfa_offset 16
	popq	%r15
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end5:
	.size	_ZN6cursor4main17h6a4e86eb354fd049E, .Lfunc_end5-_ZN6cursor4main17h6a4e86eb354fd049E
	.cfi_endproc

	.section	.text.main,"ax",@progbits
	.globl	main
	.p2align	4, 0x90
	.type	main,@function
main:
	.cfi_startproc
	pushq	%rax
	.cfi_def_cfa_offset 16
	movq	%rsi, %rcx
	movslq	%edi, %rdx
	leaq	_ZN6cursor4main17h6a4e86eb354fd049E(%rip), %rax
	movq	%rax, (%rsp)
	leaq	.L__unnamed_1(%rip), %rsi
	movq	%rsp, %rdi
	callq	*_ZN3std2rt19lang_start_internal17h142b9cc66267fea1E@GOTPCREL(%rip)
	popq	%rcx
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end6:
	.size	main, .Lfunc_end6-main
	.cfi_endproc

WIthout --release, the debug build output asm does not inline read_exact, though:

	.section	.text._ZN6cursor4main17h7a95d388df9093a0E,"ax",@progbits
	.p2align	4, 0x90
	.type	_ZN6cursor4main17h7a95d388df9093a0E,@function
_ZN6cursor4main17h7a95d388df9093a0E:
.Lfunc_begin179:
	.file	41 "/home/w/Code/tests/rust/cursor/src/main.rs"
	.loc	41 11 0 is_stmt 1
	.cfi_startproc
	subq	$568, %rsp
	.cfi_def_cfa_offset 576
.Ltmp713:
	.loc	41 12 24 prologue_end
	movb	$1, 100(%rsp)
	movb	$2, 101(%rsp)
	movb	$3, 102(%rsp)
	movb	$4, 103(%rsp)
	leaq	100(%rsp), %rdi
.Ltmp714:
	.loc	41 13 19
	callq	_ZN3std2io6cursor15Cursor$LT$T$GT$3new17hedd17a1827a7a97aE
	movq	%rdx, 112(%rsp)
	movq	%rax, 104(%rsp)
.Ltmp715:
	.loc	41 16 20
	leaq	200(%rsp), %rdi
	leaq	104(%rsp), %rsi
	callq	_ZN9byteorder2io12ReadBytesExt7read_u817h564428688e309ffcE
	.loc	41 0 20 is_stmt 0
	leaq	.L__unnamed_17(%rip), %rax
	.loc	41 16 20
	leaq	200(%rsp), %rdi
	movq	%rax, %rsi
	callq	_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h34cf537a266a32f5E
	movb	%al, 199(%rsp)
	.loc	41 0 20
	movq	_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hd2673ece5df91901E@GOTPCREL(%rip), %rsi
	.loc	41 16 5
	leaq	199(%rsp), %rax
	movq	%rax, 184(%rsp)
	movq	184(%rsp), %rax
	movq	%rax, 536(%rsp)
.Ltmp716:
	.loc	41 16 5
	movq	%rax, %rdi
	callq	_ZN4core3fmt10ArgumentV13new17hd754bd8a69d22ee8E
	movq	%rax, 88(%rsp)
	movq	%rdx, 80(%rsp)
	.loc	41 0 5
	leaq	.L__unnamed_18(%rip), %rax
	movq	88(%rsp), %rcx
	.loc	41 16 5
	movq	%rcx, 168(%rsp)
	movq	80(%rsp), %rdx
	movq	%rdx, 176(%rsp)
.Ltmp717:
	.loc	41 16 5
	leaq	168(%rsp), %rsi
	leaq	120(%rsp), %rdi
	movq	%rsi, 72(%rsp)
	movq	%rax, %rsi
	movl	$2, %edx
	movq	72(%rsp), %rcx
	movl	$1, %r8d
	callq	_ZN4core3fmt9Arguments6new_v117hd958e3b7230f7202E
	leaq	120(%rsp), %rdi
	callq	*_ZN3std2io5stdio6_print17h0d31d4b9faa6e1ecE@GOTPCREL(%rip)
	.loc	41 17 20 is_stmt 1
	leaq	304(%rsp), %rdi
	leaq	104(%rsp), %rsi
	callq	_ZN9byteorder2io12ReadBytesExt7read_u817h564428688e309ffcE
	.loc	41 0 20 is_stmt 0
	leaq	.L__unnamed_19(%rip), %rax
	.loc	41 17 20
	leaq	304(%rsp), %rdi
	movq	%rax, %rsi
	callq	_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h34cf537a266a32f5E
	movb	%al, 303(%rsp)
	.loc	41 0 20
	movq	_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hd2673ece5df91901E@GOTPCREL(%rip), %rsi
	.loc	41 17 5
	leaq	303(%rsp), %rax
	movq	%rax, 288(%rsp)
	movq	288(%rsp), %rax
	movq	%rax, 544(%rsp)
.Ltmp718:
	.loc	41 17 5
	movq	%rax, %rdi
	callq	_ZN4core3fmt10ArgumentV13new17hd754bd8a69d22ee8E
	movq	%rax, 64(%rsp)
	movq	%rdx, 56(%rsp)
	.loc	41 0 5
	leaq	.L__unnamed_18(%rip), %rax
	movq	64(%rsp), %rcx
	.loc	41 17 5
	movq	%rcx, 272(%rsp)
	movq	56(%rsp), %rdx
	movq	%rdx, 280(%rsp)
.Ltmp719:
	.loc	41 17 5
	leaq	272(%rsp), %rsi
	leaq	224(%rsp), %rdi
	movq	%rsi, 48(%rsp)
	movq	%rax, %rsi
	movl	$2, %edx
	movq	48(%rsp), %rcx
	movl	$1, %r8d
	callq	_ZN4core3fmt9Arguments6new_v117hd958e3b7230f7202E
	leaq	224(%rsp), %rdi
	callq	*_ZN3std2io5stdio6_print17h0d31d4b9faa6e1ecE@GOTPCREL(%rip)
	.loc	41 18 20 is_stmt 1
	leaq	408(%rsp), %rdi
	leaq	104(%rsp), %rsi
	callq	_ZN9byteorder2io12ReadBytesExt7read_u817h564428688e309ffcE
	.loc	41 0 20 is_stmt 0
	leaq	.L__unnamed_20(%rip), %rax
	.loc	41 18 20
	leaq	408(%rsp), %rdi
	movq	%rax, %rsi
	callq	_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h34cf537a266a32f5E
	movb	%al, 407(%rsp)
	.loc	41 0 20
	movq	_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hd2673ece5df91901E@GOTPCREL(%rip), %rsi
	.loc	41 18 5
	leaq	407(%rsp), %rax
	movq	%rax, 392(%rsp)
	movq	392(%rsp), %rax
	movq	%rax, 552(%rsp)
.Ltmp720:
	.loc	41 18 5
	movq	%rax, %rdi
	callq	_ZN4core3fmt10ArgumentV13new17hd754bd8a69d22ee8E
	movq	%rax, 40(%rsp)
	movq	%rdx, 32(%rsp)
	.loc	41 0 5
	leaq	.L__unnamed_18(%rip), %rax
	movq	40(%rsp), %rcx
	.loc	41 18 5
	movq	%rcx, 376(%rsp)
	movq	32(%rsp), %rdx
	movq	%rdx, 384(%rsp)
.Ltmp721:
	.loc	41 18 5
	leaq	376(%rsp), %rsi
	leaq	328(%rsp), %rdi
	movq	%rsi, 24(%rsp)
	movq	%rax, %rsi
	movl	$2, %edx
	movq	24(%rsp), %rcx
	movl	$1, %r8d
	callq	_ZN4core3fmt9Arguments6new_v117hd958e3b7230f7202E
	leaq	328(%rsp), %rdi
	callq	*_ZN3std2io5stdio6_print17h0d31d4b9faa6e1ecE@GOTPCREL(%rip)
	.loc	41 19 20 is_stmt 1
	leaq	512(%rsp), %rdi
	leaq	104(%rsp), %rsi
	callq	_ZN9byteorder2io12ReadBytesExt7read_u817h564428688e309ffcE
	.loc	41 0 20 is_stmt 0
	leaq	.L__unnamed_21(%rip), %rax
	.loc	41 19 20
	leaq	512(%rsp), %rdi
	movq	%rax, %rsi
	callq	_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h34cf537a266a32f5E
	movb	%al, 511(%rsp)
	.loc	41 0 20
	movq	_ZN4core3fmt3num3imp51_$LT$impl$u20$core..fmt..Display$u20$for$u20$u8$GT$3fmt17hd2673ece5df91901E@GOTPCREL(%rip), %rsi
	.loc	41 19 5
	leaq	511(%rsp), %rax
	movq	%rax, 496(%rsp)
	movq	496(%rsp), %rax
	movq	%rax, 560(%rsp)
.Ltmp722:
	.loc	41 19 5
	movq	%rax, %rdi
	callq	_ZN4core3fmt10ArgumentV13new17hd754bd8a69d22ee8E
	movq	%rax, 16(%rsp)
	movq	%rdx, 8(%rsp)
	.loc	41 0 5
	leaq	.L__unnamed_18(%rip), %rax
	movq	16(%rsp), %rcx
	.loc	41 19 5
	movq	%rcx, 480(%rsp)
	movq	8(%rsp), %rdx
	movq	%rdx, 488(%rsp)
.Ltmp723:
	.loc	41 19 5
	leaq	480(%rsp), %rsi
	leaq	432(%rsp), %rdi
	movq	%rsi, (%rsp)
	movq	%rax, %rsi
	movl	$2, %edx
	movq	(%rsp), %rcx
	movl	$1, %r8d
	callq	_ZN4core3fmt9Arguments6new_v117hd958e3b7230f7202E
	leaq	432(%rsp), %rdi
	callq	*_ZN3std2io5stdio6_print17h0d31d4b9faa6e1ecE@GOTPCREL(%rip)
.Ltmp724:
	.loc	41 20 2 is_stmt 1
	addq	$568, %rsp
	.cfi_def_cfa_offset 8
	retq
.Ltmp725:
.Lfunc_end179:
	.size	_ZN6cursor4main17h7a95d388df9093a0E, .Lfunc_end179-_ZN6cursor4main17h7a95d388df9093a0E
	.cfi_endproc

	.section	.text.main,"ax",@progbits
	.globl	main
	.p2align	4, 0x90
	.type	main,@function
main:
.Lfunc_begin180:
	.cfi_startproc
	subq	$24, %rsp
	.cfi_def_cfa_offset 32
	movb	__rustc_debug_gdb_scripts_section__(%rip), %al
	movslq	%edi, %rcx
	leaq	_ZN6cursor4main17h7a95d388df9093a0E(%rip), %rdi
	movq	%rsi, 16(%rsp)
	movq	%rcx, %rsi
	movq	16(%rsp), %rdx
	movb	%al, 15(%rsp)
	callq	_ZN3std2rt10lang_start17hb31d7644e3c573daE
	addq	$24, %rsp
	.cfi_def_cfa_offset 8
	retq
.Lfunc_end180:
	.size	main, .Lfunc_end180-main
	.cfi_endproc

I am using the current stable rustc, 1.48.0:

$ rustc -vV
rustc 1.48.0 (7eac88abb 2020-11-16)
binary: rustc
commit-hash: 7eac88abb2e57e752f3302f02be5f3ce3d7adfb4
commit-date: 2020-11-16
host: x86_64-unknown-linux-gnu
release: 1.48.0
LLVM version: 11.0

@adrian17
Copy link

adrian17 commented Dec 22, 2020

I can also see this in https://github.com/ruffle-rs/ruffle , where Cursor+Byteorder are used extensively for parsing (https://github.com/ruffle-rs/ruffle/blob/master/swf/src/avm1/read.rs#L92) ; wasm code built with --release builds with read_exact calls, which shows up on profiles.

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 23, 2021
@yaahc yaahc added the P-low Low priority label Jun 23, 2021
@the8472
Copy link
Member

the8472 commented Jul 4, 2021

@wecing Indeed, the examples given in #47321 (comment) and #47321 (comment) do inline read_exact on current nightlies and only call out to println machinery.

@adrian17 Can you check if the issue still occurs for you and if so provide a reduced example? Since the given link is not pointing to a specific commit it currently refers to an arm in huge match block.

@adrian17
Copy link

adrian17 commented Jul 4, 2021

I think it's good now, thank you :)

@the8472
Copy link
Member

the8472 commented Jul 4, 2021

Closing since all reported cases now inline as expected.

@the8472 the8472 closed this as completed Jul 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests