Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the build (i.e. Build faster) #489

Closed
briansmith opened this issue Mar 19, 2017 · 12 comments
Closed

Speed up the build (i.e. Build faster) #489

briansmith opened this issue Mar 19, 2017 · 12 comments

Comments

@briansmith
Copy link
Owner

In another bug, @luser suggested that we add a way to just build the digest API, without the rest, in the name of making the build faster: “Mostly faster compiling, yeah. Rust compilation is slow enough, pulling in another large dependency just to use one small bit of it makes the problem even worse.”

Since that time, the build system has been completely rewritten, mostly through @weiznich's awesome work. One thing we did was pregenerate the assembly language code from the PerlAsm scripts, so that all the Perl steps are skipped when building from crates.io, which may help.

But, we should still try to make the build faster. This requires somebody to profile the build to find out what the bottlenecks are.

Wild guess ideas:

  • While we did spend significant effort ensuring the build is parallelized, we didn't make everything perfect in that respect. There is probably some low-hanging fruit regarding parallelism there.

  • When we're not building from .Git (when ".git" doesn't exist), maybe we should just avoid dirty checking and just go straight to compiling everything unconditionally. Presumably, Cargo does its own dirty checking to ensure each library is only built once, so our own dirty checking is superfluous in that case as our build script would only be run when all files are dirty. (Dirty checking would still be essential for Git builds, of course.)

  • Now there is a ring-test library that contains some test code, which is built in addition to the ring-core library which contains the C & asm code for the library proper. The ring-test library ideally shouldn't even be linked into the ring library at all. If we changed constant_time_test and bn_test to be "integration" tests instead of "unit" tests then we could use #[link] inside the integration test files to link libring-test.a only to the integration tests, and not to everything else. If this were done, then dependent crates that build ring from crates.io wouldn't need to link libring-test.a. Presumably, we would use the presence or absence of ".git" in build.rs to determine whether or not to build libring-test.a. However, I'm not sure that this is a good idea because users can do "cargo test -p ring" from within a dependent crate to run the ring test suite, and this would break. Probably we need to wait for Add support for test build scripts rust-lang/cargo#1581 instead.

@briansmith
Copy link
Owner Author

More ideas:

  • Deleting even more code, and making the code simpler yet. This is actually the primary way in which I intend to contribute to resolving this issue.

  • Switch from C++ to Rust for the tests in bn_test.cc. bn_test.cc is the last C++ test suite. Presumably the C++ compiler is much slower than the C compiler, so getting rid of the C++ code should make the build faster. Note that bn_test.cc forces the existence of most (all?) the C/C++ code in crypto/test, which is several files. Though, the added Rust code may make the build even slower.

  • Many of the "unit" tests in ring are probably better recast as "integration" tests. This involves moving them out from inline submodules in src/ to their own separate files in tests/. Presumably, when ring is built for the purpose of being used by a dependent crate, only the sources in src/ are compiled, not the tests; the tests should only be built if/when the user ever runs "cargo test -p ring". (If that is not the case, that's probably a performance bug in cargo that should be fixed.) This should non-trivially reduce the amount of Rust code that the Rust compiler sees during a typical compilation. Also, this is something we want to do anyway because it helps ensure that the public API is properly exported.

@emberian
Copy link

emberian commented Apr 6, 2017

Here's some profiles of -Z time-passes -Z time-llvm-passes.

Fresh debug build takes 15.8s:

$  cargo rustc -- -Z time-passes -Z time-llvm-passes 
   Compiling libc v0.2.21
   Compiling gcc v0.3.45
   Compiling lazy_static v0.2.6
   Compiling untrusted v0.3.2
   Compiling rand v0.3.15
   Compiling num_cpus v1.3.0
   Compiling deque v0.3.1
   Compiling rayon v0.6.0
   Compiling ring v0.7.3 (file:///home/cmr/proj/ring)
time: 0.046; rss: 56MB	parsing
time: 0.000; rss: 56MB	recursion limit
time: 0.000; rss: 56MB	crate injection
time: 0.000; rss: 56MB	plugin loading
time: 0.000; rss: 56MB	plugin registration
time: 0.078; rss: 95MB	expansion
time: 0.000; rss: 95MB	maybe building test harness
time: 0.000; rss: 95MB	maybe creating a macro crate
time: 0.000; rss: 95MB	checking for inline asm in case the target doesn't support it
time: 0.003; rss: 95MB	early lint checks
time: 0.001; rss: 95MB	AST validation
time: 0.012; rss: 98MB	name resolution
time: 0.008; rss: 98MB	complete gated feature checking
time: 0.010; rss: 104MB	lowering ast -> hir
time: 0.003; rss: 104MB	indexing hir
time: 0.001; rss: 104MB	attribute checking
time: 0.004; rss: 107MB	language item collection
time: 0.002; rss: 107MB	lifetime resolution
time: 0.000; rss: 107MB	looking for entry point
time: 0.000; rss: 107MB	looking for plugin registrar
time: 0.003; rss: 107MB	region resolution
time: 0.001; rss: 107MB	loop checking
time: 0.000; rss: 107MB	static item recursion checking
time: 0.031; rss: 108MB	compute_incremental_hashes_map
time: 0.000; rss: 108MB	load_dep_graph
time: 0.001; rss: 108MB	stability index
time: 0.003; rss: 108MB	stability checking
time: 0.461; rss: 124MB	type collecting
time: 0.000; rss: 124MB	variance inference
time: 0.000; rss: 124MB	impl wf inference
time: 0.014; rss: 127MB	coherence checking
time: 0.021; rss: 127MB	wf checking
time: 0.045; rss: 127MB	item-types checking
time: 0.324; rss: 134MB	item-bodies checking
time: 0.022; rss: 136MB	const checking
time: 0.005; rss: 136MB	privacy checking
time: 0.002; rss: 136MB	intrinsic checking
time: 0.001; rss: 136MB	effect checking
time: 0.005; rss: 136MB	match checking
time: 0.003; rss: 136MB	liveness checking
time: 0.016; rss: 136MB	rvalue checking
time: 0.037; rss: 151MB	MIR dump
  time: 0.005; rss: 151MB	SimplifyCfg
  time: 0.008; rss: 151MB	QualifyAndPromoteConstants
  time: 0.012; rss: 151MB	TypeckMir
  time: 0.000; rss: 151MB	SimplifyBranches
  time: 0.002; rss: 151MB	SimplifyCfg
time: 0.028; rss: 151MB	MIR cleanup and validation
time: 0.044; rss: 151MB	borrow checking
time: 0.000; rss: 151MB	reachability checking
time: 0.003; rss: 151MB	death checking
time: 0.000; rss: 151MB	unused lib feature checking
time: 0.041; rss: 151MB	lint checking
time: 0.000; rss: 151MB	resolving dependency formats
  time: 0.000; rss: 151MB	NoLandingPads
  time: 0.002; rss: 151MB	SimplifyCfg
  time: 0.004; rss: 151MB	EraseRegions
  time: 0.001; rss: 151MB	AddCallGuards
  time: 0.015; rss: 154MB	ElaborateDrops
  time: 0.000; rss: 154MB	NoLandingPads
  time: 0.003; rss: 154MB	SimplifyCfg
  time: 0.000; rss: 154MB	Inline
  time: 0.003; rss: 154MB	InstCombine
  time: 0.001; rss: 154MB	Deaggregator
  time: 0.000; rss: 154MB	CopyPropagation
  time: 0.003; rss: 154MB	SimplifyLocals
  time: 0.001; rss: 154MB	AddCallGuards
  time: 0.000; rss: 154MB	PreTrans
time: 0.033; rss: 154MB	MIR optimisations
  time: 0.009; rss: 154MB	write metadata
  time: 0.066; rss: 156MB	translation item collection
  time: 0.013; rss: 156MB	codegen unit partitioning
  time: 0.007; rss: 176MB	internalize symbols
time: 0.532; rss: 176MB	translation
time: 0.000; rss: 176MB	assert dep graph
time: 0.000; rss: 176MB	serialize dep graph
  time: 0.046; rss: 142MB	llvm function passes [0]
  time: 0.035; rss: 144MB	llvm module passes [0]
  time: 0.941; rss: 150MB	codegen passes [0]
  time: 0.000; rss: 149MB	codegen passes [0]
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.1400 seconds (0.1241 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0333 ( 37.0%) 0.0167 ( 33.3%) 0.0500 ( 35.7%) 0.0349 ( 28.2%) Instruction Selection
0.0067 ( 7.4%) 0.0100 ( 20.0%) 0.0167 ( 11.9%) 0.0214 ( 17.2%) Instruction Scheduling
0.0133 ( 14.8%) 0.0033 ( 6.7%) 0.0167 ( 11.9%) 0.0185 ( 14.9%) DAG Combining 1
0.0200 ( 22.2%) 0.0067 ( 13.3%) 0.0267 ( 19.0%) 0.0122 ( 9.8%) DAG Combining 2
0.0100 ( 11.1%) 0.0067 ( 13.3%) 0.0167 ( 11.9%) 0.0117 ( 9.4%) Instruction Creation
0.0033 ( 3.7%) 0.0000 ( 0.0%) 0.0033 ( 2.4%) 0.0100 ( 8.0%) DAG Legalization
0.0033 ( 3.7%) 0.0067 ( 13.3%) 0.0100 ( 7.1%) 0.0081 ( 6.5%) Type Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0032 ( 2.6%) DAG Combining after legalize types
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0021 ( 1.7%) Instruction Scheduling Cleanup
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 1.6%) Vector Legalization
0.0900 (100.0%) 0.0500 (100.0%) 0.1400 (100.0%) 0.1241 (100.0%) Total

===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.1133 seconds (0.1251 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0733 ( 73.3%) 0.0067 ( 50.0%) 0.0800 ( 70.6%) 0.0846 ( 67.7%) Debug Info Emission
0.0233 ( 23.3%) 0.0067 ( 50.0%) 0.0300 ( 26.5%) 0.0395 ( 31.6%) DWARF Exception Writer
0.0033 ( 3.3%) 0.0000 ( 0.0%) 0.0033 ( 2.9%) 0.0009 ( 0.8%) DWARF Debug Writer
0.1000 (100.0%) 0.0133 (100.0%) 0.1133 (100.0%) 0.1251 (100.0%) Total

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.8700 seconds (0.8690 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2100 ( 29.7%) 0.0833 ( 51.0%) 0.2933 ( 33.7%) 0.2814 ( 32.4%) X86 DAG->DAG Instruction Selection
0.2267 ( 32.1%) 0.0233 ( 14.3%) 0.2500 ( 28.7%) 0.2652 ( 30.5%) X86 Assembly / Object Emitter
0.0533 ( 7.5%) 0.0067 ( 4.1%) 0.0600 ( 6.9%) 0.0519 ( 6.0%) Module Verifier
0.0367 ( 5.2%) 0.0033 ( 2.0%) 0.0400 ( 4.6%) 0.0425 ( 4.9%) Module Verifier
0.0367 ( 5.2%) 0.0067 ( 4.1%) 0.0433 ( 5.0%) 0.0420 ( 4.8%) Module Verifier
0.0233 ( 3.3%) 0.0033 ( 2.0%) 0.0267 ( 3.1%) 0.0281 ( 3.2%) Inliner for always_inline functions
0.0167 ( 2.4%) 0.0067 ( 4.1%) 0.0233 ( 2.7%) 0.0259 ( 3.0%) Prologue/Epilogue Insertion & Frame Finalization
0.0133 ( 1.9%) 0.0100 ( 6.1%) 0.0233 ( 2.7%) 0.0223 ( 2.6%) Fast Register Allocator
0.0100 ( 1.4%) 0.0033 ( 2.0%) 0.0133 ( 1.5%) 0.0125 ( 1.4%) Live DEBUG_VALUE analysis
0.0100 ( 1.4%) 0.0033 ( 2.0%) 0.0133 ( 1.5%) 0.0121 ( 1.4%) Machine Function Analysis
0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0104 ( 1.2%) Insert stack protectors
0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0084 ( 1.0%) Two-Address instruction pass
0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0067 ( 0.8%) Dominator Tree Construction
0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0054 ( 0.6%) CallGraph Construction
0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0051 ( 0.6%) Dominator Tree Construction
0.0100 ( 1.4%) 0.0000 ( 0.0%) 0.0100 ( 1.1%) 0.0047 ( 0.5%) Natural Loop Information
0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0045 ( 0.5%) Dominator Tree Construction
0.0067 ( 0.9%) 0.0000 ( 0.0%) 0.0067 ( 0.8%) 0.0040 ( 0.5%) Scalar Evolution Analysis
0.0067 ( 0.9%) 0.0033 ( 2.0%) 0.0100 ( 1.1%) 0.0039 ( 0.5%) Dominator Tree Construction
0.0033 ( 0.5%) 0.0033 ( 2.0%) 0.0067 ( 0.8%) 0.0039 ( 0.4%) Function Alias Analysis Results
0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0038 ( 0.4%) Expand Atomic instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0027 ( 0.3%) Exception handling preparation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.3%) Post-RA pseudo instruction expansion pass
0.0033 ( 0.5%) 0.0033 ( 2.0%) 0.0067 ( 0.8%) 0.0019 ( 0.2%) X86 pseudo instruction expansion pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.2%) Remove unreachable blocks from the CFG
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.2%) Bundle Machine CFG Edges
0.0000 ( 0.0%) 0.0033 ( 2.0%) 0.0033 ( 0.4%) 0.0014 ( 0.2%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.2%) Eliminate PHI nodes for register allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.1%) Expand ISel Pseudo-instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.1%) Insert XRay ops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.1%) Implement the 'patchable-function' attribute
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.1%) StackMap Liveness Analysis
0.0033 ( 0.5%) 0.0000 ( 0.0%) 0.0033 ( 0.4%) 0.0008 ( 0.1%) Local Stack Slot Allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 FP Stackifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Contiguously Lay Out Funclets
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 WinAlloca Expander
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Safe Stack instrumentation pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.1%) X86 vzeroupper inserter
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.1%) Shadow Stack GC Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.1%) Lower Garbage Collection Instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.7067 (100.0%) 0.1633 (100.0%) 0.8700 (100.0%) 0.8690 (100.0%) Total

time: 1.078; rss: 149MB LLVM passes
time: 0.000; rss: 149MB serialize work products
time: 0.031; rss: 131MB linking
Finished dev [unoptimized + debuginfo] target(s) in 15.8 secs

It takes 3.3s to build just libring.rlib, and not the C code, and 9.3s to build both the rlib and the C code.

It takes 0.41 seconds to build a trivial crate that takes the SHA512 of a line of stdin, after the deps (including ring) are already built:

   Compiling f v0.1.0 (file:///home/cmr/proj/ring/t/f)
time: 0.000; rss: 48MB	parsing
time: 0.000; rss: 48MB	recursion limit
time: 0.000; rss: 48MB	crate injection
time: 0.000; rss: 48MB	plugin loading
time: 0.000; rss: 48MB	plugin registration
time: 0.022; rss: 84MB	expansion
time: 0.000; rss: 84MB	maybe building test harness
time: 0.000; rss: 84MB	maybe creating a macro crate
time: 0.000; rss: 84MB	checking for inline asm in case the target doesn't support it
time: 0.000; rss: 84MB	early lint checks
time: 0.000; rss: 84MB	AST validation
time: 0.005; rss: 84MB	name resolution
time: 0.000; rss: 84MB	complete gated feature checking
time: 0.000; rss: 84MB	lowering ast -> hir
time: 0.000; rss: 84MB	indexing hir
time: 0.000; rss: 84MB	attribute checking
time: 0.000; rss: 84MB	language item collection
time: 0.000; rss: 84MB	lifetime resolution
time: 0.000; rss: 84MB	looking for entry point
time: 0.000; rss: 84MB	looking for plugin registrar
time: 0.000; rss: 84MB	region resolution
time: 0.000; rss: 84MB	loop checking
time: 0.000; rss: 84MB	static item recursion checking
time: 0.000; rss: 87MB	compute_incremental_hashes_map
time: 0.000; rss: 87MB	load_dep_graph
time: 0.000; rss: 87MB	stability index
time: 0.000; rss: 87MB	stability checking
time: 0.000; rss: 87MB	type collecting
time: 0.000; rss: 87MB	variance inference
time: 0.000; rss: 87MB	impl wf inference
time: 0.000; rss: 87MB	coherence checking
time: 0.000; rss: 87MB	wf checking
time: 0.001; rss: 87MB	item-types checking
time: 0.011; rss: 101MB	item-bodies checking
time: 0.002; rss: 101MB	const checking
time: 0.000; rss: 101MB	privacy checking
time: 0.000; rss: 101MB	intrinsic checking
time: 0.000; rss: 101MB	effect checking
time: 0.000; rss: 101MB	match checking
time: 0.000; rss: 101MB	liveness checking
time: 0.000; rss: 101MB	rvalue checking
time: 0.000; rss: 101MB	MIR dump
  time: 0.000; rss: 101MB	SimplifyCfg
  time: 0.000; rss: 101MB	QualifyAndPromoteConstants
  time: 0.000; rss: 101MB	TypeckMir
  time: 0.000; rss: 101MB	SimplifyBranches
  time: 0.000; rss: 101MB	SimplifyCfg
time: 0.001; rss: 101MB	MIR cleanup and validation
time: 0.000; rss: 101MB	borrow checking
time: 0.000; rss: 101MB	reachability checking
time: 0.000; rss: 101MB	death checking
time: 0.000; rss: 101MB	unused lib feature checking
warning: unused result which must be used
 --> src/main.rs:7:5
  |
7 |     stdin().read_line(&mut s);
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: #[warn(unused_must_use)] on by default

time: 0.000; rss: 101MB lint checking
time: 0.002; rss: 101MB resolving dependency formats
time: 0.000; rss: 101MB NoLandingPads
time: 0.000; rss: 101MB SimplifyCfg
time: 0.000; rss: 101MB EraseRegions
time: 0.000; rss: 101MB AddCallGuards
time: 0.000; rss: 101MB ElaborateDrops
time: 0.000; rss: 101MB NoLandingPads
time: 0.000; rss: 101MB SimplifyCfg
time: 0.000; rss: 101MB Inline
time: 0.000; rss: 101MB InstCombine
time: 0.000; rss: 101MB Deaggregator
time: 0.000; rss: 101MB CopyPropagation
time: 0.000; rss: 101MB SimplifyLocals
time: 0.000; rss: 101MB AddCallGuards
time: 0.000; rss: 101MB PreTrans
time: 0.000; rss: 101MB MIR optimisations
time: 0.000; rss: 101MB write metadata
time: 0.004; rss: 101MB translation item collection
time: 0.001; rss: 101MB codegen unit partitioning
time: 0.001; rss: 114MB internalize symbols
time: 0.095; rss: 114MB translation
time: 0.000; rss: 114MB assert dep graph
time: 0.000; rss: 114MB serialize dep graph
time: 0.002; rss: 114MB llvm function passes [0]
time: 0.001; rss: 114MB llvm module passes [0]
time: 0.037; rss: 118MB codegen passes [0]
time: 0.000; rss: 118MB codegen passes [0]
===-------------------------------------------------------------------------===
Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
Total Execution Time: 0.0100 seconds (0.0045 wall clock)

---User Time--- --User+System-- ---Wall Time--- --- Name ---
0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0012 ( 26.8%) Instruction Selection
0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0008 ( 18.6%) Instruction Scheduling
0.0033 ( 33.3%) 0.0033 ( 33.3%) 0.0008 ( 17.4%) DAG Combining 1
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 10.3%) DAG Combining 2
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 10.1%) Instruction Creation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 8.3%) DAG Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 5.2%) Type Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 1.8%) Instruction Scheduling Cleanup
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 1.0%) Vector Legalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.4%) DAG Combining after legalize types
0.0100 (100.0%) 0.0100 (100.0%) 0.0045 (100.0%) Total

===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0000 seconds (0.0047 wall clock)

---Wall Time--- --- Name ---
0.0032 ( 68.8%) Debug Info Emission
0.0012 ( 26.7%) DWARF Exception Writer
0.0002 ( 4.5%) DWARF Debug Writer
0.0047 (100.0%) Total

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0300 seconds (0.0305 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0100 ( 37.5%) 0.0000 ( 0.0%) 0.0100 ( 33.3%) 0.0102 ( 33.5%) X86 DAG->DAG Instruction Selection
0.0067 ( 25.0%) 0.0000 ( 0.0%) 0.0067 ( 22.2%) 0.0085 ( 27.8%) X86 Assembly / Object Emitter
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 4.8%) Module Verifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 4.6%) Module Verifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 3.7%) Module Verifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 3.3%) Prologue/Epilogue Insertion & Frame Finalization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 2.6%) Fast Register Allocator
0.0033 ( 12.5%) 0.0033 (100.0%) 0.0067 ( 22.2%) 0.0007 ( 2.3%) Machine Function Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 2.0%) Inliner for always_inline functions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 1.8%) Live DEBUG_VALUE analysis
0.0033 ( 12.5%) 0.0000 ( 0.0%) 0.0033 ( 11.1%) 0.0004 ( 1.4%) Insert stack protectors
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 1.0%) Two-Address instruction pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 1.0%) Profile summary info
0.0033 ( 12.5%) 0.0000 ( 0.0%) 0.0033 ( 11.1%) 0.0003 ( 0.9%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.8%) Natural Loop Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) CallGraph Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.7%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.6%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.5%) Exception handling preparation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.5%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.4%) Remove unreachable blocks from the CFG
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.4%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) Bundle Machine CFG Edges
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) X86 pseudo instruction expansion pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.3%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Eliminate PHI nodes for register allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Expand ISel Pseudo-instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) Insert XRay ops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) X86 FP Stackifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) StackMap Liveness Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) Shadow Stack GC Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) Implement the 'patchable-function' attribute
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Local Stack Slot Allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Safe Stack instrumentation pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 WinAlloca Expander
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Contiguously Lay Out Funclets
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) X86 vzeroupper inserter
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Lower Garbage Collection Instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0267 (100.0%) 0.0033 (100.0%) 0.0300 (100.0%) 0.0305 (100.0%) Total

time: 0.041; rss: 118MB LLVM passes
time: 0.000; rss: 118MB serialize work products
time: 0.196; rss: 121MB running linker
time: 0.198; rss: 121MB linking
Finished dev [unoptimized + debuginfo] target(s) in 0.41 secs

Here's the size of the artifacts inside libring.rlib:

60K     add.o
8.0K    aes-x86_64-elf.o
40K     aes.o
8.0K    aesni-gcm-x86_64-elf.o
8.0K    aesni-x86_64-elf.o
68K     bn.o
76K     bn_test_convert.o
52K     bn_test_new.o
8.0K    bsaes-x86_64-elf.o
12K     chacha-x86_64-elf.o
56K     cmp.o
76K     constant_time_test.o
60K     convert.o
48K     cpu-intel.o
44K     crypto.o
160K    curve25519.o
64K     div.o
60K     e_aes.o
48K     ecp_nistz.o
4.0K    ecp_nistz256-x86_64-elf.o
216K    ecp_nistz256.o
60K     exponentiation.o
52K     gcd.o
68K     gcm.o
60K     generic.o
48K     gfp_p256.o
84K     gfp_p384.o
12K     ghash-x86_64-elf.o
64K     limbs.o
40K     mem.o
60K     montgomery.o
56K     montgomery_inv.o
52K     mul.o
12K     p256-x86_64-asm-elf.o
12K     poly1305-x86_64-elf.o
44K     random.o
1.7M    ring-15bf6c46f8e53abc.0.o
24K     sha256-x86_64-elf.o
24K     sha512-x86_64-elf.o
56K     shift.o
96K     sysrand.o
8.0K    vpaes-x86_64-elf.o
16K     x25519-asm-x86_64.o
48K     x25519-x86_64.o
8.0K    x86_64-mont-elf.o
12K     x86_64-mont5-elf.o
716K    ring-15bf6c46f8e53abc.0.bytecode.deflate
944K    rust.metadata.bin

In release mode, things aren't that much worse. Building ring by itself takes 15.6s to build, 21.78s with build deps. Rebuilding just libring.rlib takes just 4.64s. Building the C code takes most of the time. Here's the profile:

   Compiling ring v0.7.3 (file:///home/cmr/proj/ring)
time: 0.045; rss: 56MB	parsing
time: 0.000; rss: 56MB	recursion limit
time: 0.000; rss: 56MB	crate injection
time: 0.000; rss: 56MB	plugin loading
time: 0.000; rss: 56MB	plugin registration
time: 0.077; rss: 94MB	expansion
time: 0.000; rss: 94MB	maybe building test harness
time: 0.001; rss: 94MB	maybe creating a macro crate
time: 0.000; rss: 94MB	checking for inline asm in case the target doesn't support it
time: 0.003; rss: 94MB	early lint checks
time: 0.001; rss: 94MB	AST validation
time: 0.011; rss: 99MB	name resolution
time: 0.008; rss: 99MB	complete gated feature checking
time: 0.009; rss: 101MB	lowering ast -> hir
time: 0.003; rss: 105MB	indexing hir
time: 0.001; rss: 105MB	attribute checking
time: 0.004; rss: 105MB	language item collection
time: 0.002; rss: 105MB	lifetime resolution
time: 0.000; rss: 105MB	looking for entry point
time: 0.000; rss: 105MB	looking for plugin registrar
time: 0.003; rss: 107MB	region resolution
time: 0.001; rss: 107MB	loop checking
time: 0.000; rss: 107MB	static item recursion checking
time: 0.014; rss: 107MB	compute_incremental_hashes_map
time: 0.000; rss: 107MB	load_dep_graph
time: 0.001; rss: 107MB	stability index
time: 0.003; rss: 107MB	stability checking
time: 0.464; rss: 123MB	type collecting
time: 0.000; rss: 123MB	variance inference
time: 0.000; rss: 123MB	impl wf inference
time: 0.014; rss: 126MB	coherence checking
time: 0.020; rss: 126MB	wf checking
time: 0.045; rss: 126MB	item-types checking
time: 0.322; rss: 133MB	item-bodies checking
time: 0.022; rss: 133MB	const checking
time: 0.005; rss: 133MB	privacy checking
time: 0.004; rss: 133MB	intrinsic checking
time: 0.001; rss: 133MB	effect checking
time: 0.006; rss: 133MB	match checking
time: 0.003; rss: 133MB	liveness checking
time: 0.016; rss: 133MB	rvalue checking
time: 0.036; rss: 150MB	MIR dump
  time: 0.004; rss: 150MB	SimplifyCfg
  time: 0.009; rss: 150MB	QualifyAndPromoteConstants
  time: 0.013; rss: 150MB	TypeckMir
  time: 0.000; rss: 150MB	SimplifyBranches
  time: 0.002; rss: 150MB	SimplifyCfg
time: 0.029; rss: 150MB	MIR cleanup and validation
time: 0.045; rss: 152MB	borrow checking
time: 0.000; rss: 152MB	reachability checking
time: 0.003; rss: 152MB	death checking
time: 0.000; rss: 152MB	unused lib feature checking
time: 0.041; rss: 152MB	lint checking
time: 0.000; rss: 152MB	resolving dependency formats
  time: 0.000; rss: 152MB	NoLandingPads
  time: 0.002; rss: 152MB	SimplifyCfg
  time: 0.005; rss: 152MB	EraseRegions
  time: 0.001; rss: 152MB	AddCallGuards
  time: 0.014; rss: 152MB	ElaborateDrops
  time: 0.000; rss: 152MB	NoLandingPads
  time: 0.002; rss: 152MB	SimplifyCfg
  time: 0.000; rss: 152MB	Inline
  time: 0.002; rss: 152MB	InstCombine
  time: 0.001; rss: 152MB	Deaggregator
  time: 0.000; rss: 152MB	CopyPropagation
  time: 0.003; rss: 152MB	SimplifyLocals
  time: 0.001; rss: 152MB	AddCallGuards
  time: 0.000; rss: 152MB	PreTrans
time: 0.031; rss: 152MB	MIR optimisations
  time: 0.009; rss: 154MB	write metadata
  time: 0.065; rss: 156MB	translation item collection
  time: 0.013; rss: 156MB	codegen unit partitioning
  time: 0.007; rss: 170MB	internalize symbols
time: 0.391; rss: 170MB	translation
time: 0.000; rss: 170MB	assert dep graph
time: 0.000; rss: 170MB	serialize dep graph
  time: 0.164; rss: 136MB	llvm function passes [0]
  time: 2.155; rss: 140MB	llvm module passes [0]
  time: 0.562; rss: 143MB	codegen passes [0]
  time: 0.001; rss: 143MB	codegen passes [0]
===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0167 seconds (0.0174 wall clock)

---User Time--- --User+System-- ---Wall Time--- --- Name ---
0.0100 ( 60.0%) 0.0100 ( 60.0%) 0.0118 ( 67.5%) Global Splitting
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 13.1%) Evict
0.0033 ( 20.0%) 0.0033 ( 20.0%) 0.0017 ( 9.7%) Spiller
0.0033 ( 20.0%) 0.0033 ( 20.0%) 0.0013 ( 7.4%) Local Splitting
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 2.3%) Seed Live Regs
0.0167 (100.0%) 0.0167 (100.0%) 0.0174 (100.0%) Total

===-------------------------------------------------------------------------===
Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
Total Execution Time: 0.1567 seconds (0.1546 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0367 ( 25.0%) 0.0033 ( 33.3%) 0.0400 ( 25.5%) 0.0419 ( 27.1%) Instruction Selection
0.0167 ( 11.4%) 0.0000 ( 0.0%) 0.0167 ( 10.6%) 0.0219 ( 14.2%) Instruction Scheduling
0.0200 ( 13.6%) 0.0000 ( 0.0%) 0.0200 ( 12.8%) 0.0215 ( 13.9%) DAG Combining 1
0.0167 ( 11.4%) 0.0033 ( 33.3%) 0.0200 ( 12.8%) 0.0186 ( 12.0%) DAG Combining 2
0.0167 ( 11.4%) 0.0000 ( 0.0%) 0.0167 ( 10.6%) 0.0138 ( 8.9%) DAG Legalization
0.0200 ( 13.6%) 0.0033 ( 33.3%) 0.0233 ( 14.9%) 0.0125 ( 8.1%) Instruction Creation
0.0100 ( 6.8%) 0.0000 ( 0.0%) 0.0100 ( 6.4%) 0.0103 ( 6.7%) Type Legalization
0.0067 ( 4.5%) 0.0000 ( 0.0%) 0.0067 ( 4.3%) 0.0080 ( 5.2%) DAG Combining after legalize types
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0041 ( 2.6%) Vector Legalization
0.0033 ( 2.3%) 0.0000 ( 0.0%) 0.0033 ( 2.1%) 0.0018 ( 1.2%) Instruction Scheduling Cleanup
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) DAG Combining after legalize vectors
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type Legalization 2
0.1467 (100.0%) 0.0100 (100.0%) 0.1567 (100.0%) 0.1546 (100.0%) Total

===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0033 seconds (0.0011 wall clock)

--System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0033 (100.0%) 0.0033 (100.0%) 0.0007 ( 63.9%) DWARF Exception Writer
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 35.4%) Debug Info Emission
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.6%) DWARF Debug Writer
0.0033 (100.0%) 0.0033 (100.0%) 0.0011 (100.0%) Total

===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 2.5233 seconds (2.5454 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2267 ( 9.6%) 0.0167 ( 10.9%) 0.2433 ( 9.6%) 0.2300 ( 9.0%) Dominator Tree Construction
0.2000 ( 8.4%) 0.0200 ( 13.0%) 0.2200 ( 8.7%) 0.2087 ( 8.2%) Function Integration/Inlining
0.1400 ( 5.9%) 0.0100 ( 6.5%) 0.1500 ( 5.9%) 0.1373 ( 5.4%) Combine redundant instructions
0.1267 ( 5.3%) 0.0067 ( 4.3%) 0.1333 ( 5.3%) 0.1243 ( 4.9%) Global Value Numbering
0.0933 ( 3.9%) 0.0067 ( 4.3%) 0.1000 ( 4.0%) 0.0995 ( 3.9%) Induction Variable Simplification
0.0833 ( 3.5%) 0.0067 ( 4.3%) 0.0900 ( 3.6%) 0.0955 ( 3.8%) Combine redundant instructions
0.1067 ( 4.5%) 0.0000 ( 0.0%) 0.1067 ( 4.2%) 0.0951 ( 3.7%) Combine redundant instructions
0.1067 ( 4.5%) 0.0033 ( 2.2%) 0.1100 ( 4.4%) 0.0945 ( 3.7%) Global Value Numbering
0.0867 ( 3.7%) 0.0000 ( 0.0%) 0.0867 ( 3.4%) 0.0875 ( 3.4%) Combine redundant instructions
0.0767 ( 3.2%) 0.0000 ( 0.0%) 0.0767 ( 3.0%) 0.0860 ( 3.4%) Combine redundant instructions
0.0600 ( 2.5%) 0.0000 ( 0.0%) 0.0600 ( 2.4%) 0.0730 ( 2.9%) SROA
0.0367 ( 1.5%) 0.0033 ( 2.2%) 0.0400 ( 1.6%) 0.0613 ( 2.4%) SROA
0.0500 ( 2.1%) 0.0033 ( 2.2%) 0.0533 ( 2.1%) 0.0475 ( 1.9%) Dead Store Elimination
0.0500 ( 2.1%) 0.0000 ( 0.0%) 0.0500 ( 2.0%) 0.0459 ( 1.8%) Value Propagation
0.0367 ( 1.5%) 0.0000 ( 0.0%) 0.0367 ( 1.5%) 0.0457 ( 1.8%) Value Propagation
0.0500 ( 2.1%) 0.0033 ( 2.2%) 0.0533 ( 2.1%) 0.0439 ( 1.7%) Module Verifier
0.0500 ( 2.1%) 0.0000 ( 0.0%) 0.0500 ( 2.0%) 0.0432 ( 1.7%) Early CSE
0.0267 ( 1.1%) 0.0033 ( 2.2%) 0.0300 ( 1.2%) 0.0381 ( 1.5%) MemCpy Optimization
0.0300 ( 1.3%) 0.0000 ( 0.0%) 0.0300 ( 1.2%) 0.0354 ( 1.4%) Jump Threading
0.0200 ( 0.8%) 0.0067 ( 4.3%) 0.0267 ( 1.1%) 0.0340 ( 1.3%) Combine redundant instructions
0.0333 ( 1.4%) 0.0100 ( 6.5%) 0.0433 ( 1.7%) 0.0323 ( 1.3%) Combine redundant instructions
0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0318 ( 1.3%) Combine redundant instructions
0.0267 ( 1.1%) 0.0000 ( 0.0%) 0.0267 ( 1.1%) 0.0292 ( 1.1%) Greedy Register Allocator
0.0367 ( 1.5%) 0.0000 ( 0.0%) 0.0367 ( 1.5%) 0.0291 ( 1.1%) Jump Threading
0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0264 ( 1.0%) Loop Strength Reduction
0.0200 ( 0.8%) 0.0067 ( 4.3%) 0.0267 ( 1.1%) 0.0259 ( 1.0%) Machine Instruction Scheduler
0.0233 ( 1.0%) 0.0000 ( 0.0%) 0.0233 ( 0.9%) 0.0247 ( 1.0%) Loop Invariant Code Motion
0.0167 ( 0.7%) 0.0000 ( 0.0%) 0.0167 ( 0.7%) 0.0240 ( 0.9%) Early CSE
0.0167 ( 0.7%) 0.0033 ( 2.2%) 0.0200 ( 0.8%) 0.0200 ( 0.8%) Reassociate expressions
0.0300 ( 1.3%) 0.0000 ( 0.0%) 0.0300 ( 1.2%) 0.0162 ( 0.6%) Simplify the CFG
0.0100 ( 0.4%) 0.0033 ( 2.2%) 0.0133 ( 0.5%) 0.0146 ( 0.6%) Promote 'by reference' arguments to scalars
0.0100 ( 0.4%) 0.0033 ( 2.2%) 0.0133 ( 0.5%) 0.0141 ( 0.6%) Deduce function attributes
0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0132 ( 0.5%) SLP Vectorizer
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0130 ( 0.5%) Simplify the CFG
0.0067 ( 0.3%) 0.0067 ( 4.3%) 0.0133 ( 0.5%) 0.0128 ( 0.5%) Simplify the CFG
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0125 ( 0.5%) Loop Invariant Code Motion
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0125 ( 0.5%) Simplify the CFG
0.0167 ( 0.7%) 0.0000 ( 0.0%) 0.0167 ( 0.7%) 0.0119 ( 0.5%) Live Variable Analysis
0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0118 ( 0.5%) Globals Alias Analysis
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0116 ( 0.5%) CodeGen Prepare
0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0110 ( 0.4%) Simplify the CFG
0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0109 ( 0.4%) Simplify the CFG
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0108 ( 0.4%) Module Verifier
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0097 ( 0.4%) Natural Loop Information
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0096 ( 0.4%) Sparse Conditional Constant Propagation
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0096 ( 0.4%) Interprocedural Sparse Conditional Constant Propagation
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0083 ( 0.3%) Bit-Tracking Dead Code Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0083 ( 0.3%) Scoped NoAlias Alias Analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0069 ( 0.3%) Natural Loop Information
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0069 ( 0.3%) Tail Call Elimination
0.0200 ( 0.8%) 0.0000 ( 0.0%) 0.0200 ( 0.8%) 0.0068 ( 0.3%) Dominator Tree Construction
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0068 ( 0.3%) Unroll loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0066 ( 0.3%) Dominator Tree Construction
0.0133 ( 0.6%) 0.0000 ( 0.0%) 0.0133 ( 0.5%) 0.0066 ( 0.3%) Loop Invariant Code Motion
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0064 ( 0.3%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0063 ( 0.2%) Unroll loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0062 ( 0.2%) Machine Common Subexpression Elimination
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0061 ( 0.2%) Dominator Tree Construction
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0060 ( 0.2%) Unswitch loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0058 ( 0.2%) Remove unused exception handling info
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0057 ( 0.2%) Remove redundant instructions
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0053 ( 0.2%) Dominator Tree Construction
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0051 ( 0.2%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0051 ( 0.2%) Loop-Closed SSA Form Pass
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0050 ( 0.2%) Natural Loop Information
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0050 ( 0.2%) Prologue/Epilogue Insertion & Frame Finalization
0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0049 ( 0.2%) Aggressive Dead Code Elimination
0.0033 ( 0.1%) 0.0033 ( 2.2%) 0.0067 ( 0.3%) 0.0049 ( 0.2%) Machine Function Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0048 ( 0.2%) Control Flow Optimizer
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0046 ( 0.2%) Insert stack protectors
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0046 ( 0.2%) Dead Argument Elimination
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0044 ( 0.2%) Dominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0043 ( 0.2%) Loop-Closed SSA Form Pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0042 ( 0.2%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0041 ( 0.2%) Dominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0040 ( 0.2%) Simplify the CFG
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0039 ( 0.2%) Scalar Evolution Analysis
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0039 ( 0.2%) Loop Vectorization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.2%) Rotate Loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0038 ( 0.2%) Demanded bits analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0038 ( 0.1%) Scalar Evolution Analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0037 ( 0.1%) Loop-Closed SSA Form Pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) Canonicalize natural loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0034 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0032 ( 0.1%) Canonicalize natural loops
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0032 ( 0.1%) X86 Byte/Word Instruction Fixup
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0031 ( 0.1%) Function Alias Analysis Results
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0031 ( 0.1%) Demanded bits analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0030 ( 0.1%) Lazy Value Information Analysis
0.0100 ( 0.4%) 0.0000 ( 0.0%) 0.0100 ( 0.4%) 0.0030 ( 0.1%) Two-Address instruction pass
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0030 ( 0.1%) Dominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0028 ( 0.1%) Canonicalize natural loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0027 ( 0.1%) Dominator Tree Construction
0.0067 ( 0.3%) 0.0033 ( 2.2%) 0.0100 ( 0.4%) 0.0027 ( 0.1%) Lazy Value Information Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0026 ( 0.1%) Dominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.1%) Function Alias Analysis Results
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0024 ( 0.1%) Natural Loop Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.1%) Virtual Register Rewriter
0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0024 ( 0.1%) Natural Loop Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0024 ( 0.1%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 0.1%) Recognize loop idioms
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0023 ( 0.1%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0022 ( 0.1%) Machine Loop Invariant Code Motion
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0022 ( 0.1%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0021 ( 0.1%) Function Alias Analysis Results
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0021 ( 0.1%) Function Alias Analysis Results
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0021 ( 0.1%) MergedLoadStoreMotion
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.1%) PGOIndirectCallPromotion
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.1%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0067 ( 4.3%) 0.0067 ( 0.3%) 0.0020 ( 0.1%) Function Alias Analysis Results
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0019 ( 0.1%) Loop Load Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0018 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0018 ( 0.1%) MachineDominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0018 ( 0.1%) Execution dependency fix
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Speculatively execute instructions if target has divergent branches
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0017 ( 0.1%) Global Variable Optimizer
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0017 ( 0.1%) CallGraph Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0016 ( 0.1%) Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.1%) Remove dead machine instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Loop-Closed SSA Form Pass
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0015 ( 0.1%) Loop-Closed SSA Form Pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Machine Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Machine InstCombiner
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.1%) Demanded bits analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0015 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Loop-Closed SSA Form Pass
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0014 ( 0.1%) Alignment from assumptions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.1%) Dead Global Elimination
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Eliminate PHI nodes for register allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Machine Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Slot index numbering
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0013 ( 0.1%) MachineDominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.1%) Memory Dependence Analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0013 ( 0.1%) Promote Memory to Register
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0013 ( 0.1%) CallGraph Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) Memory Dependence Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) Memory Dependence Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachineDominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) X86 LEA Optimize
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Dominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Machine Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachinePostDominator Tree Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0012 ( 0.0%) Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) MachinePostDominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Constant Hoisting
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Slot index numbering
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) MachineDominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Memory Dependence Analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0011 ( 0.0%) Delete dead loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Machine Natural Loop Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Lower 'expect' Intrinsics
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) Machine Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) Loop Access Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) Float to int
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) MachineDominator Tree Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) Expand Atomic instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Loop-Closed SSA Form Pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) Machine Natural Loop Construction
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0007 ( 0.0%) Post-RA pseudo instruction expansion pass
0.0067 ( 0.3%) 0.0000 ( 0.0%) 0.0067 ( 0.3%) 0.0007 ( 0.0%) Profile summary info
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) Canonicalize natural loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0007 ( 0.0%) Canonicalize natural loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Machine Loop Invariant Code Motion
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0006 ( 0.0%) Partially inline calls to library functions
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0006 ( 0.0%) Stack Slot Coloring
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Machine Natural Loop Construction
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Tail Duplication
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Canonicalize natural loops
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0005 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) Remove unreachable machine basic blocks
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) X86 Optimize Call Frame
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Canonicalize natural loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0033 ( 2.2%) 0.0033 ( 0.1%) 0.0004 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Canonicalize natural loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Tail Duplication
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Rotate Loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) X86 pseudo instruction expansion pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) Scalar Evolution Analysis
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0004 ( 0.0%) Debug Variable Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Remove unreachable blocks from the CFG
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Canonicalize natural loops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Machine Trace Metrics
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Live Register Matrix
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Spill Code Placement Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Shrink Wrapping analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Function Alias Analysis Results
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Bundle Machine CFG Edges
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Expand ISel Pseudo-instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) Globals Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) X86 LEA Fixup
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Live Stack Slot Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) X86 Fixup SetCC
0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0002 ( 0.0%) Post RA top-down list latency scheduler
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Loop Access Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Loop Distribition
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Virtual Register Map
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Optimize machine instruction PHIs
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Insert XRay ops
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Local Dynamic TLS Access Clean-up
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Early If-Conversion
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Live DEBUG_VALUE analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) StackMap Liveness Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Loop Access Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Optimization Remark Emitter
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Rename Disconnected Subregister Components
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Contiguously Lay Out Funclets
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 Atom pad short functions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Lazy Block Frequency Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Implement the 'patchable-function' attribute
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Lower Garbage Collection Instructions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Local Stack Slot Allocation
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 WinAlloca Expander
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 FP Stackifier
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) X86 vzeroupper inserter
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Shadow Stack GC Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Deduce function attributes in RPO
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Infer set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Strip Unused Function Prototypes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Eliminate Available Externally Globals
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) A No-Op Barrier Pass
2.3700 (100.0%) 0.1533 (100.0%) 2.5233 (100.0%) 2.5454 (100.0%) Total

time: 2.897; rss: 143MB LLVM passes
time: 0.000; rss: 143MB serialize work products
time: 0.011; rss: 134MB linking
Finished release [optimized] target(s) in 4.64 secs

@emberian
Copy link

emberian commented Apr 6, 2017

Note that moving out #[cfg(test)] stuff will have approximately 0 impact on build times. It would make expansion and parsing slightly faster but that's already less than 130ms of the build.

@briansmith
Copy link
Owner Author

Note that moving out #[cfg(test)] stuff will have approximately 0 impact on build times. It would make expansion and parsing slightly faster but that's already less than 130ms of the build.

I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.

I've done a bunch of work to remove the need for C++ at all, and to greatly reduce the amount of C code present. Unfortunately it's kind of a big project that's 75% done, hard to commit incrementally, and kind of stalled right now. But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.

@emberian
Copy link

I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring.

That's a good benefit that I didn't think of.

But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it.

I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode.

In release mode, the asm takes 0.3802813 seconds, and the C takes 4.583867 seconds.

@briansmith
Copy link
Owner Author

briansmith commented Apr 16, 2017 via email

@eddyb
Copy link

eddyb commented Apr 23, 2017

cc @arielb1

@briansmith
Copy link
Owner Author

briansmith commented Apr 25, 2017

In efdffc9 and 0aea3d2 we removed 10 C/C++ source files, including all the C++ source files. It would be good to see if this made any significant difference in the build time on the systems you are measuring it on.

@eddyb
Copy link

eddyb commented Apr 25, 2017

Now that rust-lang/rust#41469 is merged, you should also look at time-passes.

@briansmith briansmith changed the title Speed up the build Speed up the build (i.e. Build faster) Apr 25, 2017
@briansmith
Copy link
Owner Author

BTW, if you don't need RSA then in ring 0.8.0 you'll be able to with --no-default-features or --no-default-features --features=dev_urandom_fallback to avoid building some Rust code. build.rs could be changed to avoid building crypto/bn/* when the use_heap default feature isn't enabled to make that even faster.

@briansmith
Copy link
Owner Author

Is there anybody unhappy with the build time now? We've made several changes that should be improvements, though we haven't attempted to measure everything again. Without new measurements this is unactionable.

@briansmith
Copy link
Owner Author

OK, I'm going to close this now, on the assumption everything is A-OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants