Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression in deterministic codegen from 1.32 to 1.33 when using --remap-path-prefix to compensate for change in source dir #59542

Closed
jhfrontz opened this issue Mar 30, 2019 · 55 comments
Assignees
Labels
A-codegen Area: Code generation A-reproducibility Area: Reproducible / Deterministic builds T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jhfrontz
Copy link

jhfrontz commented Mar 30, 2019

I use a gitian build environment to compile rust source in a well-defined/stabile environment. When using rust 1.32 (rust-1.32.0-1.el7.x86_64.rpm on Centos 7), I can deterministically build object/binary (with the help of RUSTFLAGS="--remap-path-prefix=%{_builddir}=BUILDDIR -C link-arg=-Wl,--build-id=0x%{githash},-S" while running inside an rpmbuild) -- like, the build process when using 1.32 is so deterministic between runs that I can use diff (or cmp or sha256sum) to verify that two products/executables produced on different runs are identical.

However, as of 1.33 (rust-1.33.0-2.el7.x86_64.rpm on Centos 7), I get significant variation from one run to another:

$ size *float-[12]/build/usr/bin/program
   text	   data	    bss	    dec	    hex	filename
6154818	 157808	    688	6313314	 605562	20190329-float-1/build/usr/bin/program
6148249	 157328	    688	6306265	 6039d9	20190329-float-2/build/usr/bin/program

Among other things, the layout of the address space seems to vary (sample):

$ sdiff <(objdump -d *float-1/build/usr/bin/program) <(objdump -d *float-2/build/usr/bin/program) | head -20

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx904e-20190329-float-1/bui |	xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx904e-20190329-float-2/bui


Disassembly of section .init:					Disassembly of section .init:

00000000001438b0 <_init>:				      |	00000000001430b0 <_init>:
  1438b0:	48 83 ec 08          	sub    $0x8,%rsp      |	  1430b0:	48 83 ec 08          	sub    $0x8,%rsp
  1438b4:	48 8b 05 c5 19 6c 00 	mov    0x6c19c5(%rip) |	  1430b4:	48 8b 05 ed 00 6c 00 	mov    0x6c00ed(%rip)
  1438bb:	48 85 c0             	test   %rax,%rax      |	  1430bb:	48 85 c0             	test   %rax,%rax
  1438be:	74 05                	je     1438c5 <_init+ |	  1430be:	74 05                	je     1430c5 <_init+
  1438c0:	e8 53 01 00 00       	callq  143a18 <.plt.g |	  1430c0:	e8 53 01 00 00       	callq  143218 <.plt.g
  1438c5:	48 83 c4 08          	add    $0x8,%rsp      |	  1430c5:	48 83 c4 08          	add    $0x8,%rsp
  1438c9:	c3                   	retq   		      |	  1430c9:	c3                   	retq   

Disassembly of section .plt:					Disassembly of section .plt:

00000000001438d0 <.plt>:				      |	00000000001430d0 <.plt>:
  1438d0:	ff 35 ba 16 6b 00    	pushq  0x6b16ba(%rip) |	  1430d0:	ff 35 42 ff 6a 00    	pushq  0x6aff42(%rip)
  1438d6:	ff 25 bc 16 6b 00    	jmpq   *0x6b16bc(%rip |	  1430d6:	ff 25 44 ff 6a 00    	jmpq   *0x6aff44(%rip

Was something intentionally changed in 1.33 that might cause this behavior?

EDIT incorporating subsequent info:

I found a smallish open-source project that demonstrates the issue. Run
build.sh.txt -- under 1.32, I get all good; under 1.33, I get bad stuff.

@Centril Centril added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 30, 2019
@jonas-schievink
Copy link
Contributor

AFAIK we have very few tests for reproducible builds, so this probably slipped in by accident.

cc @rust-lang/compiler - do we care enough about reproducible builds to consider this a stable-to-stable regression at this point? I know that quite a lot of people want reproducibility in their builds.

@eddyb
Copy link
Member

eddyb commented Mar 30, 2019

This looks like a linker thing? Did we do anything wrt using lld or something like that?
cc @alexcrichton

@eddyb
Copy link
Member

eddyb commented Mar 30, 2019

@jhfrontz Can you post a diff of symbol names and sizes? The assembly will ofc differ a lot, but I expect everything to be the same size, just shuffled around.

@jhfrontz
Copy link
Author

@eddyb asks:

Can you post a diff of symbol names and sizes?

Are you wanting the output of nm on the two same-but-different binaries? Or maybe (since you said size) you're wanting the output of objdump -t ? But since you said shuffled around, maybe you're wanting objdump -h ?

I'll include the last (objdump -h): diff-h.txt

@jhfrontz
Copy link
Author

jhfrontz commented Apr 1, 2019

@eddyb I got approval to include the full objdump -t output:
objdump-t-1.txt.gz
objdump-t-2.txt.gz

.

@alexcrichton
Copy link
Member

This is most frequently an accidental regression in LLVM, although we've had bugs slip in with rustc as well! No major change to linker behavior in 1.32 -> 1.33.

@jhfrontz the best way to get this fixed is if a test case can be reduced to isolate the issue. Would it be possible to minimize this to extract a small piece of code which exhibits the reproducibility issue?

@jhfrontz
Copy link
Author

@alexcrichton asks:

Would it be possible to minimize this to extract a small piece of code which exhibits the reproducibility issue?

I haven't been able to -- my simple "hello, world" toy programs always result in the same binary. It's only when I build production (proprietary) code that I'm seeing this. Full disclosure: I barely know enough rust to be able to run cargo. If there is a model "major project" in rust that I could/should run through my deterministic build environment, I'd be glad to try it and report back.

@eddyb
Copy link
Member

eddyb commented Apr 13, 2019

Sorry, I didn't see the notification.
I was hoping for a diff that would be easy to spot the differences in, but I guess I'll download the two files and run kdiff3 on them, when I get to the office

@eddyb
Copy link
Member

eddyb commented Apr 13, 2019

I just did this:

gunzip ~/Downloads/objdump-t-{1,2}.txt.gz
cat ~/Downloads/objdump-t-1.txt | grep _ZN | sed -E 's/^.* +([^ ]+)$/\1/' | sort > sym1
cat ~/Downloads/objdump-t-2.txt | grep _ZN | sed -E 's/^.* +([^ ]+)$/\1/' | sort > sym2

And... the hashes don't match. These are using the same rustc binary, right?

Can you run cargo with -v and grab the -C metadata arguments? If they differ between the two runs, then your problem is not even in rustc/LLVM.
If they're the same, try getting the sorted symbol list from a small rlib.

@jhfrontz
Copy link
Author

jhfrontz commented Apr 16, 2019

I'm using the same binary (everything is the same, including the time -- via libfaketime).

---> Package rust.x86_64 0:1.33.0-2.el7 will be installed
--> Processing Dependency: rust-std-static(x86-64) = 1.33.0-2.el7 for package: rust-1.33.0-2.el7.x86_64
--> Processing Dependency: libLLVM-7.so(LLVM_7)(64bit) for package: rust-1.33.0-2.el7.x86_64
--> Processing Dependency: libLLVM-7.so()(64bit) for package: rust-1.33.0-2.el7.x86_64

I'm building with cargo --frozen so, I'm pretty sure by definition the actual contents metadata is the same (though cargo -v -C metadata yields error: Found argument '-C' which wasn't expected, or isn't valid in this context).

However, the ordering of the crates listed in the metadata from two builds is different. Also, the src_path entries are different (but I would expect for that to be irrelevant in the presence of the --remap-path-prefix mentioned above ). At least, the src_path is irrelevant when using 1.32.

Attached are the dumps from nm on two different runs' libuuid rlibs. They're both quite different.

3-libuuid-nm.txt
4-libuuid-nm.txt

Just to recall -- this still works as expected (producing deterministic binary) if I revert to rustc version 1.32. I ran it twice with 1.32 and

  • the cargo -v metadata output was similarly differently ordered between the runs (but seemingly the same contents, with the exception of src_path).
  • the libuuid rlib files were identical (filenames, contents, symbol tables).
  • the resulting binaries were identical.

@eddyb
Copy link
Member

eddyb commented Apr 17, 2019

Ah, yeah, --remap-path-prefix isn't perfect, so this could be a regression related to that.

Also, by cargo -v and -C metadata I meant running whatever cargo command you're running with -v (i.e. verbose) and extracting the -C metadata from the rustc invocations it prints to stderr.

I'm building with cargo --frozen so, I'm pretty sure by definition the actual contents metadata is the same

--frozen is for your dependencies, but if you're building in a different directory, that might still affect certain details.

However, the ordering of the crates listed in the metadata from two builds is different.

Not sure what you mean by "metadata" here (I was referring to a hash Cargo passes to each rustc invocation).

@eddyb
Copy link
Member

eddyb commented Apr 17, 2019

Nominating for discussion at the next @rust-lang/compiler meeting, as this seems more and more like it's a regression in how we handle source paths.

@michaelwoerister
Copy link
Member

The ordering of -C metadata arguments should not make a difference.

--remap-path-prefix can be used to map various paths that show up in panic messages and debuginfo to something deterministic/environment-independent. It doesn't do anything beyond that.

@eddyb
Copy link
Member

eddyb commented Apr 18, 2019

@michaelwoerister I was referring to the contents, not ordering.
As in, I was wondering whether Cargo was computing different values for each of the two builds.

But from what you're saying, --remap-path-prefix shouldn't have ever been relied on for deterministic outputs? Does that mean that between 1.32 and 1.33, we started hashing the input paths, whereas before we weren't at all?

Or did we start hashing the remap-path-prefix value itself?
What would even cause all of this to trickle down into mangled symbol hashes?

@pnkfelix pnkfelix changed the title regression in deterministic generation of binary from 1.32 to 1.33 regression in deterministic codegen from 1.32 to 1.33 when using --remap-path-prefix to compensate for change in source dir Apr 18, 2019
@michaelwoerister
Copy link
Member

I'd like to have an actual reproducer for this. Otherwise it's hard to dig any deeper.

@pnkfelix
Copy link
Member

We discussed in T-compiler team meeting today. removing nominated tag.

@pnkfelix pnkfelix added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed I-nominated labels Apr 18, 2019
@pnkfelix
Copy link
Member

(and I'm going to hijack the "S-waiting-on-author" tag to mean "waiting on more info from issue filer.")

@jhfrontz
Copy link
Author

I found a smallish open-source project that demonstrates the issue. Run
build.sh.txt -- under 1.32, I get all good; under 1.33, I get bad stuff.

@eddyb
Copy link
Member

eddyb commented Apr 19, 2019

@michaelwoerister Oh, if --remap-path-prefix is supported for this usecase, maybe we could take a look at some incremental/query log, specifically when moving (or aliasing with ln -s) the one directory to two, removing everything in target except incremental caches and re-running cargo build in two.
I suspect that the same issue affecting symbol hashes would also show up in there.

@pnkfelix pnkfelix removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 19, 2019
@jhfrontz
Copy link
Author

@eddyb wrote

maybe we could take a look at some incremental/query log, specifically when moving (or aliasing with ln -s) the one directory to two, removing everything in target except incremental caches and re-running cargo build in two.

I just realized that you may have been asking me to do some further data gathering -- if so, I'm not sure I understand what you're requesting. Should I unroll the loop in build.sh.txt and have the two pass do something different?

@michaelwoerister
Copy link
Member

I don't think incremental compilation should be used at all when trying to create reproducible builds. @jhfrontz, could you try to set CARGO_INCREMENTAL=0 in your test setup?

@jhfrontz
Copy link
Author

jhfrontz commented Apr 26, 2019

@michaelwoerister suggests:

I don't think incremental compilation should be used at all when trying to create reproducible builds. @jhfrontz, could you try to set CARGO_INCREMENTAL=0 in your test setup?

No effect; problem in 1.33 persists:

$ rustc --version
rustc 1.33.0
$ CARGO_INCREMENTAL=0 ./build.sh
...
bad stuff

For comparison, 1.32:

$ rustc --version
rustc 1.32.0
$ CARGO_INCREMENTAL=0 ./build.sh
...
all good

@eddyb
Copy link
Member

eddyb commented May 1, 2019

The "magic" is that Cargo sees the same RUSTFLAGS, so it doesn't think anything is different, but then rustc normalizes the . to the right absolute directory (which you used to pass yourself).
Frankly, I'm surprised (and a bit suspicious) that it works!

Can you check in some way that no paths end up in the result, only BUILDDIR?

@apoelstra
Copy link
Contributor

Confirmed that only BUILDDIR appears in the binaries output from Jeff's script.

I'm also suspicious and surprised because I can't see where in the rustc remapping code . would get expanded.

@eddyb
Copy link
Member

eddyb commented May 1, 2019

I just checked the relevant source, and, yeah, I don't know why this would work.
@jhfrontz Does it pass if you remove --remap-path-prefix=.=BUILDDIR completely?

@apoelstra
Copy link
Contributor

It fails for me without the remapping. And checking the outputted binary it has a lot of stuff like

/home/apoelstra/Downloads/one/ff/vendor/clap/src/osstringext.rs

i.e. the full path is there, not the literal . or anything else.

@jhfrontz
Copy link
Author

jhfrontz commented May 1, 2019

Also -- --remap-path-prefix=.=BUILDDIR does not help for my production build (i.e., even with the ., I get different binary on each subsequent build).

@eddyb
Copy link
Member

eddyb commented May 1, 2019

I believe this would explain why at least some cases are handled:

// This path of this SourceFile has been modified by
// path-remapping, so we use it verbatim (and avoid
// cloning the whole map in the process).
_ if source_file.name_was_remapped => source_file.clone(),
// Otherwise expand all paths to absolute paths because
// any relative paths are potentially relative to a
// wrong directory.
FileName::Real(ref name) => {
let mut adapted = (**source_file).clone();
adapted.name = Path::new(&working_dir).join(name).into();

@jhfrontz I'm not sure I can figure that out without a binary diff between the two binaries.

I doubt .. can work, since it's never canonicalized by rustc, so it would only affect a literal .. in the input.
Maybe src instead of . might work? Or both (.=BUILDDIR and src=BUILDIR/src).

The problem is I don't know what kind of paths you have, and the first component in each path matters.
If the paths are absolute, then there's no way around having the absolute path in --remap-path-prefix.

But maybe the right approach would be rust-lang/cargo#6503 being changed to not affect -C metadata, only Cargo's caching. cc @alexcrichton @ehuss Is that even possible right now?

@alexcrichton
Copy link
Member

Unfortunately the purpose of that Cargo PR is to affect -C metadata, but we could back that PR out if it causes too many problems (or add some flag/config to Cargo to not affect metadata but still do it by default or something like that)

@apoelstra
Copy link
Contributor

It would be nice if there were a flag. I wouldn't want to back out the PR. I'm surprised we didn't see more problems before the Cargo PR; I seem to recall a lot of trouble a year or so ago with Miri caused by not enough things going into metadata and builds using stale/incompatible dependencies.

Having said that, based on @jhfrontz's original post, and our inability to use the .=BUILDDIR trick in our production code, I'm not sure that the metadata issue is the only one at play here. I believe he is working on extracting a smaller test case that we can share.

@eddyb
Copy link
Member

eddyb commented May 3, 2019

@alexcrichton Ah, I misunderstood an important aspect: all the artifacts are kept in the same directory, and only -C metadata makes rustc ignore those from incompatible builds?

@alexcrichton
Copy link
Member

@eddyb that plus it's the granularity Cargo caches at. The intention of the PR was that if you oscillate back and forth on RUSTFLAGS values you'll have incremental builds each time, whereas previously it would recompile the whole crate graph.

@jhfrontz

This comment has been minimized.

@eddyb

This comment has been minimized.

@eddyb

This comment has been minimized.

@jhfrontz

This comment has been minimized.

@jhfrontz

This comment has been minimized.

@jhfrontz

This comment has been minimized.

@eddyb

This comment has been minimized.

@apoelstra

This comment has been minimized.

@eddyb

This comment has been minimized.

@michaelwoerister
Copy link
Member

Hey, to re-iterate: There is no way to have reproducible builds via Cargo from different source directories if Cargo feeds the RUSTFLAGS value into the -Cmetadata argument to rustc. In my opinion this is a bug in Cargo. Either Cargo needs to provide a way to pass the --remap-path-prefix argument to rustc without RUSTFLAGS or the new behavior should be made optional.

@eddyb

This comment has been minimized.

@michaelwoerister

This comment has been minimized.

@jhfrontz
Copy link
Author

jhfrontz commented May 7, 2019

OK, it turns out that it is cargo that is causing the issue (@eddyb found that I wasn't actually using the 1.32 version of cargo in my earlier experiments). Now, with cargo-1.32 and rust-1.34, I get deterministic output:

$ find . -name create_test_daemon_conf | grep build/usr/bin | xargs sha256sum
b0d3d894800eb6c80286e5830a6f9c28394f9bb0ed6d1a52827bd813e8848e6a  ./artifacts-2/build/usr/bin/create_test_daemon_conf
b0d3d894800eb6c80286e5830a6f9c28394f9bb0ed6d1a52827bd813e8848e6a  ./artifacts/build/usr/bin/create_test_daemon_conf

@eddyb
Copy link
Member

eddyb commented May 7, 2019

I've just hidden a bunch of comments that were really confusing because of unrigorous debugging.

Also, as we've determined this was entirely caused by Cargo, I've opened rust-lang/cargo#6914 instead, and I'm closing this issue on the Rust side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-reproducibility Area: Reproducible / Deterministic builds T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants