Skip to content

Commit

Permalink
Auto merge of #37640 - michaelwoerister:llvm-type-names, r=brson
Browse files Browse the repository at this point in the history
trans: Make type names in LLVM IR independent of crate-nums and source locations.

UPDATE:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code and the order in which extern crates are loaded. The new type names look like the old ones, except for closures and the `<crate-num>.` prefix being gone. Resolution of name clashes (e.g. of the same type in different crate versions) is left to LLVM (which will just append `.<counter>` to the name).

ORIGINAL TEXT:
This PR makes the type names we assign in LLVM IR independent of the type definition's location in the source code. Before, the type of closures contained the closures definition location. The new naming scheme follows the same pattern that we already use for symbol names: We have a human readable prefix followed by a hash that makes sure we don't have any collisions. Here is an example of what the new names look like:

```rust
// prog.rs - example program

mod mod1
{
    pub struct Struct<T>(pub T);
}

fn main() {
    use mod1::Struct;

    let _s = Struct(0u32);
    let _t = Struct('h');
    let _x = Struct(Struct(0i32));
}
```
Old:
```llvm
%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
```
New:
```llvm
%"prog::mod1::Struct<u32>::ejDrT" = type { i32 }
%"prog::mod1::Struct<char>::2eEAU" = type { i32 }
%"prog::mod1::Struct<prog::mod1::Struct<i32>>::ehCqR" = type { %"prog::mod1::Struct<i32>::$fAo2" }
%"prog::mod1::Struct<i32>::$fAo2" = type { i32 }
```

As you can see, the new names are slightly more verbose, but also more consistent. There is no difference now between a local type and one from another crate (before, non-local types where prefixed with `<crate-num>.` as in `2.std::mod1::Type1`).

There is a bit of design space here. For example, we could leave off the crate name for local definitions (making names shorter but less consistent):
```llvm
%"mod1::Struct<u32>::ejDrT" = type { i32 }
%"mod1::Struct<char>::2eEAU" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>::ehCqR" = type { %"mod1::Struct<i32>::$fAo2" }
%"mod1::Struct<i32>::$fAo2" = type { i32 }
```

We could also put the hash in front, which might be more readable:
```llvm
%"ejDrT.mod1::Struct<u32>" = type { i32 }
%"2eEAU.mod1::Struct<char>" = type { i32 }
%"ehCqR.mod1::Struct<mod1::Struct<i32>>" = type { %"$fAo2.mod1::Struct<i32>" }
%"$fAo2.mod1::Struct<i32>" = type { i32 }
```

We could probably also get rid of the hash if we used full DefPaths and crate-nums (though I'm not yet a 100% sure if crate-nums could mess with incremental compilation).

```llvm
%"mod1::Struct<u32>" = type { i32 }
%"mod1::Struct<char>" = type { i32 }
%"mod1::Struct<mod1::Struct<i32>>" = type { %"mod1::Struct<i32>" }
%"mod1::Struct<i32>" = type { i32 }
%"2.std::mod1::Type1" = type { ... }
```
I would prefer the solution with the hashes because it is nice and consistent conceptually, but visually it's admittedly a bit uglier. Maybe @rust-lang/compiler would like to bikeshed a little about this.

On a related note: Has anyone ever tried if the LTO-linker will merge equal types with different names?
(^ @brson, @alexcrichton ^)
If not, that would be a reason to make type names more consistent.
  • Loading branch information
bors authored Nov 14, 2016
2 parents 87b76a5 + 276f052 commit 435246b
Show file tree
Hide file tree
Showing 10 changed files with 323 additions and 272 deletions.
64 changes: 64 additions & 0 deletions src/librustc_data_structures/base_n.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// Copyright 2016 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

/// Convert unsigned integers into a string representation with some base.
/// Bases up to and including 36 can be used for case-insensitive things.

use std::str;

pub const MAX_BASE: u64 = 64;
const BASE_64: &'static [u8; MAX_BASE as usize] =
b"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@$";

#[inline]
pub fn push_str(mut n: u64, base: u64, output: &mut String) {
debug_assert!(base >= 2 && base <= MAX_BASE);
let mut s = [0u8; 64];
let mut index = 0;

loop {
s[index] = BASE_64[(n % base) as usize];
index += 1;
n /= base;

if n == 0 {
break;
}
}
&mut s[0..index].reverse();
output.push_str(str::from_utf8(&s[0..index]).unwrap());
}

#[inline]
pub fn encode(n: u64, base: u64) -> String {
let mut s = String::with_capacity(13);
push_str(n, base, &mut s);
s
}

#[test]
fn test_encode() {
fn test(n: u64, base: u64) {
assert_eq!(Ok(n), u64::from_str_radix(&encode(n, base)[..], base as u32));
}

for base in 2..37 {
test(0, base);
test(1, base);
test(35, base);
test(36, base);
test(37, base);
test(u64::max_value(), base);

for i in 0 .. 1_000 {
test(i * 983, base);
}
}
}
1 change: 1 addition & 0 deletions src/librustc_data_structures/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ extern crate libc;
pub mod array_vec;
pub mod accumulate_vec;
pub mod small_vec;
pub mod base_n;
pub mod bitslice;
pub mod blake2b;
pub mod bitvec;
Expand Down
55 changes: 15 additions & 40 deletions src/librustc_incremental/persist/fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ use rustc::hir::svh::Svh;
use rustc::session::Session;
use rustc::ty::TyCtxt;
use rustc::util::fs as fs_util;
use rustc_data_structures::flock;
use rustc_data_structures::{flock, base_n};
use rustc_data_structures::fx::{FxHashSet, FxHashMap};

use std::ffi::OsString;
Expand All @@ -135,6 +135,12 @@ const DEP_GRAPH_FILENAME: &'static str = "dep-graph.bin";
const WORK_PRODUCTS_FILENAME: &'static str = "work-products.bin";
const METADATA_HASHES_FILENAME: &'static str = "metadata.bin";

// We encode integers using the following base, so they are shorter than decimal
// or hexadecimal numbers (we want short file and directory names). Since these
// numbers will be used in file names, we choose an encoding that is not
// case-sensitive (as opposed to base64, for example).
const INT_ENCODE_BASE: u64 = 36;

pub fn dep_graph_path(sess: &Session) -> PathBuf {
in_incr_comp_dir_sess(sess, DEP_GRAPH_FILENAME)
}
Expand Down Expand Up @@ -327,7 +333,7 @@ pub fn finalize_session_directory(sess: &Session, svh: Svh) {
let mut new_sub_dir_name = String::from(&old_sub_dir_name[.. dash_indices[2] + 1]);

// Append the svh
new_sub_dir_name.push_str(&encode_base_36(svh.as_u64()));
base_n::push_str(svh.as_u64(), INT_ENCODE_BASE, &mut new_sub_dir_name);

// Create the full path
let new_path = incr_comp_session_dir.parent().unwrap().join(new_sub_dir_name);
Expand Down Expand Up @@ -433,7 +439,8 @@ fn generate_session_dir_path(crate_dir: &Path) -> PathBuf {

let directory_name = format!("s-{}-{}-working",
timestamp,
encode_base_36(random_number as u64));
base_n::encode(random_number as u64,
INT_ENCODE_BASE));
debug!("generate_session_dir_path: directory_name = {}", directory_name);
let directory_path = crate_dir.join(directory_name);
debug!("generate_session_dir_path: directory_path = {}", directory_path.display());
Expand Down Expand Up @@ -562,27 +569,11 @@ fn extract_timestamp_from_session_dir(directory_name: &str)
string_to_timestamp(&directory_name[dash_indices[0]+1 .. dash_indices[1]])
}

const BASE_36: &'static [u8] = b"0123456789abcdefghijklmnopqrstuvwxyz";

fn encode_base_36(mut n: u64) -> String {
let mut s = Vec::with_capacity(13);
loop {
s.push(BASE_36[(n % 36) as usize]);
n /= 36;

if n == 0 {
break;
}
}
s.reverse();
String::from_utf8(s).unwrap()
}

fn timestamp_to_string(timestamp: SystemTime) -> String {
let duration = timestamp.duration_since(UNIX_EPOCH).unwrap();
let micros = duration.as_secs() * 1_000_000 +
(duration.subsec_nanos() as u64) / 1000;
encode_base_36(micros)
base_n::encode(micros, INT_ENCODE_BASE)
}

fn string_to_timestamp(s: &str) -> Result<SystemTime, ()> {
Expand Down Expand Up @@ -629,7 +620,7 @@ pub fn find_metadata_hashes_for(tcx: TyCtxt, cnum: CrateNum) -> Option<PathBuf>
};

let target_svh = tcx.sess.cstore.crate_hash(cnum);
let target_svh = encode_base_36(target_svh.as_u64());
let target_svh = base_n::encode(target_svh.as_u64(), INT_ENCODE_BASE);

let sub_dir = find_metadata_hashes_iter(&target_svh, dir_entries.filter_map(|e| {
e.ok().map(|e| e.file_name().to_string_lossy().into_owned())
Expand Down Expand Up @@ -677,7 +668,9 @@ fn crate_path(sess: &Session,
let mut hasher = DefaultHasher::new();
crate_disambiguator.hash(&mut hasher);

let crate_name = format!("{}-{}", crate_name, encode_base_36(hasher.finish()));
let crate_name = format!("{}-{}",
crate_name,
base_n::encode(hasher.finish(), INT_ENCODE_BASE));
incr_dir.join(crate_name)
}

Expand Down Expand Up @@ -1049,21 +1042,3 @@ fn test_find_metadata_hashes_iter()
None
);
}

#[test]
fn test_encode_base_36() {
fn test(n: u64) {
assert_eq!(Ok(n), u64::from_str_radix(&encode_base_36(n)[..], 36));
}

test(0);
test(1);
test(35);
test(36);
test(37);
test(u64::max_value());

for i in 0 .. 1_000 {
test(i * 983);
}
}
12 changes: 10 additions & 2 deletions src/librustc_trans/base.rs
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ use monomorphize::{self, Instance};
use partitioning::{self, PartitioningStrategy, CodegenUnit};
use symbol_map::SymbolMap;
use symbol_names_test;
use trans_item::TransItem;
use trans_item::{TransItem, DefPathBasedNames};
use type_::Type;
use type_of;
use value::Value;
Expand Down Expand Up @@ -1004,7 +1004,15 @@ impl<'blk, 'tcx> FunctionContext<'blk, 'tcx> {
}

pub fn trans_instance<'a, 'tcx>(ccx: &CrateContext<'a, 'tcx>, instance: Instance<'tcx>) {
let _s = StatRecorder::new(ccx, ccx.tcx().item_path_str(instance.def));
let _s = if ccx.sess().trans_stats() {
let mut instance_name = String::new();
DefPathBasedNames::new(ccx.tcx(), true, true)
.push_def_path(instance.def, &mut instance_name);
Some(StatRecorder::new(ccx, instance_name))
} else {
None
};

// this is an info! to allow collecting monomorphization statistics
// and to allow finding the last function before LLVM aborts from
// release builds.
Expand Down
20 changes: 19 additions & 1 deletion src/librustc_trans/collector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ use glue::{self, DropGlueKind};
use monomorphize::{self, Instance};
use util::nodemap::{FxHashSet, FxHashMap, DefIdMap};

use trans_item::{TransItem, type_to_string, def_id_to_string};
use trans_item::{TransItem, DefPathBasedNames};

#[derive(PartialEq, Eq, Hash, Clone, Copy, Debug)]
pub enum TransItemCollectionMode {
Expand Down Expand Up @@ -1234,3 +1234,21 @@ fn visit_mir_and_promoted<'tcx, V: MirVisitor<'tcx>>(mut visitor: V, mir: &mir::
visitor.visit_mir(promoted);
}
}

fn def_id_to_string<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
def_id: DefId)
-> String {
let mut output = String::new();
let printer = DefPathBasedNames::new(tcx, false, false);
printer.push_def_path(def_id, &mut output);
output
}

fn type_to_string<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,
ty: ty::Ty<'tcx>)
-> String {
let mut output = String::new();
let printer = DefPathBasedNames::new(tcx, false, false);
printer.push_type_name(ty, &mut output);
output
}
23 changes: 6 additions & 17 deletions src/librustc_trans/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ use monomorphize::Instance;
use partitioning::CodegenUnit;
use trans_item::TransItem;
use type_::Type;
use rustc_data_structures::base_n;
use rustc::ty::subst::Substs;
use rustc::ty::{self, Ty, TyCtxt};
use session::config::NoDebugInfo;
Expand Down Expand Up @@ -700,22 +701,6 @@ impl<'b, 'tcx> CrateContext<'b, 'tcx> {
&self.local_ccxs[self.index]
}

/// Get a (possibly) different `CrateContext` from the same
/// `SharedCrateContext`.
pub fn rotate(&'b self) -> CrateContext<'b, 'tcx> {
let (_, index) =
self.local_ccxs
.iter()
.zip(0..self.local_ccxs.len())
.min_by_key(|&(local_ccx, _idx)| local_ccx.n_llvm_insns.get())
.unwrap();
CrateContext {
shared: self.shared,
index: index,
local_ccxs: &self.local_ccxs[..],
}
}

/// Either iterate over only `self`, or iterate over all `CrateContext`s in
/// the `SharedCrateContext`. The iterator produces `(ccx, is_origin)`
/// pairs, where `is_origin` is `true` if `ccx` is `self` and `false`
Expand Down Expand Up @@ -975,7 +960,11 @@ impl<'b, 'tcx> CrateContext<'b, 'tcx> {
self.local().local_gen_sym_counter.set(idx + 1);
// Include a '.' character, so there can be no accidental conflicts with
// user defined names
format!("{}.{}", prefix, idx)
let mut name = String::with_capacity(prefix.len() + 6);
name.push_str(prefix);
name.push_str(".");
base_n::push_str(idx as u64, base_n::MAX_BASE, &mut name);
name
}
}

Expand Down
Loading

0 comments on commit 435246b

Please sign in to comment.