Rewrite the pattern matching code in trans. #27050

eefriedman · 2015-07-15T05:21:25Z

The old code was not well structured, difficult to understand,
and buggy.

The new implementation is completely naive, so there may be a slight
runtime performance loss. That said, adding optimizations on top of
a clear and correct implementation seems easier than trying to
fix the old mess.

Fixes issue #19064.
Fixes issue #26989.
Fixes issue #26251.
Fixes issue #18060.
Fixes issue #24875.
Fixes issue #23311.
Fixes issue #20046.

The old code was not well structured, difficult to understand, and buggy. The new implementation is completely naive, so there may be a slight runtime performance loss. That said, adding optimizations on top of a clear and correct implementation seems easier than trying to fix the old mess. Fixes issue rust-lang#19064. Fixes issue rust-lang#26989. Fixes issue rust-lang#26251. Fixes issue rust-lang#18060. Fixes issue rust-lang#24875. Fixes issue rust-lang#23311. Fixes issue rust-lang#20046.

rust-highfive · 2015-07-15T05:21:38Z

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

tamird · 2015-07-15T06:52:36Z

src/librustc_trans/trans/_match.rs

+                        Store(bcx, info, expr::get_len(bcx, scratch.val));
+                        field_datum = Datum::new(scratch.val, scratch.ty, Lvalue);
+                    }
+                    bcx = compile_pattern(bcx, field_datum, &field_pat.node.pat, failure_dest, pat_bindings);


this line is too long and failed travis

arielb1 · 2015-07-15T10:07:53Z

are there any performance implications?

petrochenkov · 2015-07-15T10:52:30Z

@eefriedman
While you are here could you look at #23121?
It was mostly implemented here https://github.com/JakubAdamBukaj/rust/tree/array-pattern-changes, but had some ICEs originating exactly in trans code you rewrote.

eefriedman · 2015-07-15T18:12:10Z

@petrochenkov I called one of those issues out with a FIXME: https://github.com/eefriedman/rust/blob/match-cleanup/src/librustc_trans/trans/_match.rs#L494 . The adjustments should be easy enough to handle.

@arielb1 I have a terrible setup for benchmarking... but at a high level the performance implication is that this version will duplicate checks in certain cases where the old code would not.

brson · 2015-07-16T17:32:29Z

Wow, epic patch!

brson · 2015-07-16T17:32:53Z

r? @pnkfelix

tamird · 2015-07-16T17:34:38Z

src/librustc_trans/trans/_match.rs

 }

-/// Helper for converting from the ValueRef that we pass around in the match code, which is always
-/// an lvalue, into a Datum. Eventually we should just pass around a Datum and be done with it.
+


this extra newline is failing tidy

brson · 2015-07-16T17:36:06Z

I wonder if leaning on llvm to optimize branches into switch statements offers enough performance opportunities for us. It seems very likely to make compilation slower, and istm there's a chance this could turn into a dead end if it turns out that writing the switch IR directly produces better code more often.

EDIT: I asked @sunfish about this strategy and he seemed enthusiastic. We should definitely do a bunch of measurement for regressions in both runtime and compile time before merging.

eefriedman · 2015-07-16T18:26:01Z

Whether we generate switch instructions or branch instructions in the IR is not at all relevant; LLVM's SimplifyCFG pass runs very early in the optimization process.

The potential for redundant code is relevant... but I think optimizations along those lines can be written incrementally. This patch doesn't introduce anything along those lines because I didn't want to complicate the patch without any evidence that it's actually useful.

eddyb · 2015-07-16T18:47:33Z

Given the upcoming MIR (rust-lang/rfcs#1211), I believe it's better to have a simpler and saner implementation which can be then lifted up to the HIR->MIR lowering pass and expanded upon.
Doing anything too clever during translation to LLVM is only asking for trouble, honestly.

arielb1 · 2015-07-16T19:01:24Z

MIR will basically require _match to be reimplemented anyway (and @nikomatsakis already has such reimplementation) as part of the lowering step.

pnkfelix · 2015-07-16T21:07:05Z

@eddyb I cannot tell if you are arguing that we should attempt to land this PR (on the basis that it will simplify future efforts to then move to a MIR based implementation), or if you (like @arielb1, I think?) are arguing that we need not invest effort trying to rewrite the existing trans/_match code because we will be better off just doing a more direct rewrite straight to the MIR based implementation?

eddyb · 2015-07-17T02:49:52Z

@pnkfelix I am for this PR, even if only the tests remain from it in the long run, that alone will ensure the MIR does not regress back to the old broken state.

Aatch · 2015-07-17T06:07:54Z

Hmm, I definitely want to see some performance stats for this. The old code may have been difficult to understand, but it used a fairly advanced method for compiling the match. I don't want to merge what could be a major performance regression just because the code is easier to read.

dotdash · 2015-07-17T15:10:54Z

I'd also like to see some numbers/codegen results, but I'd not be surprised if in some cases, e.g. matches on tuples of enums like in PartialEq, we might see better results as LLVM had some difficulties optimizing the switch there.

nikomatsakis · 2015-07-17T22:20:34Z

I don't think that a possible transition to MIR affects whether or not we should land this patch. There is going to be a fair amount of effort needed to port trans to use MIR, which will take some time, and having patterns work more correctly in the meantime is great! I agree with @Aatch that some performance data would be nice.

eefriedman · 2015-07-21T07:56:43Z

I tried some benchmarking, but my numbers are too unreliable to actually be usable. Maybe someone else can give it a shot?

bors · 2015-07-22T16:48:34Z

☔ The latest upstream changes (presumably #26683) made this pull request unmergeable. Please resolve the merge conflicts.

pnkfelix · 2015-07-28T23:03:42Z

@eefriedman I will look into benchmarking a little; sorry for the radio silence, I have been pretty swamped lately.

Manishearth · 2015-08-10T17:53:45Z

Updates on this?

pnkfelix · 2015-08-24T16:39:19Z

So, I have done some preliminary profiling.

Executive summary: This new match implementation runs faster, but regresses quality of generated code. I don't think we can land it as is without addressing the last set of micro-benchmark results presented in the shell transcript at the bottom of this comment.

Longer story:

rustc compiling libsyntax data

On some "normal code" operating on a "normal input" (in this case, rustc compiling an expanded libsyntax.rs), the differences appear to be well within the noise.

reference data (google sheet), see sheet 1
- a caveat: this set of data was gathered with some other load (namely a web browser editing above spreadsheet) on the machine in question. I discovered this had significant impact while working on the second data set below, and so I removed the extra load, but I have not gathered new numbers for the libsyntax.rs compiles yet.

rustc compiling many_small_arms data

On rustc operating on arguably abnormal input, namely a test case with a 5,000 tiny match arms, the new match code within trans is quite a bit faster than the old match code (within trans), perhaps a 4/3 speedup. (Don't get excited, there's huge drawback, discussed below.)

Note that I'm talking about codegen time here, measured by the difference between compiling to the end versus with -Z no-trans.
And this is codegen for an arguably ridiculous test case.
reference data (google sheet), see sheet 2
All of this data was gathered on an unloaded machine.

In both the above cases, I'm measuring codegen time, without passing -O to the compiler (because I was not interested in how long LLVM was taking to optimize the code it was given, though perhaps that would indeed be something of independent significance).

many_small_arms execution time data

BUT, this comes at a troubling cost.

I mentioned that test input with 5,000 tiny match arms above. It was designed to measure how fast the new match code was taking within trans.

But I then realized that I probably could (and should) be measuring the quality of the code being generated by the match in the test.

The test case above was a little bit too trivial for something like LLVM to optimize, and it also had a println in the code I would want to be an inner loop, so I modified it slightly (essentially to take and return data, and to stop printing the match result; its probably still quite easy for LLVM to optimize as written).

The new test case is here: 5,000 arm matches called 100,000 times with varying inputs.

After doing this, I discovered an unfortunate drawback to the new code in this PR. Compiling with -O has almost no impact on the execution time, but if you leave off -O (as many people do during development), the runtime for this test case goes from 0.27 seconds to 9.32 seconds, a more than 30x slowdown.

% cd ../../rust-baseline/objdir-opt/
% git log -1 --oneline
6800538 Auto merge of #26984 - nham:errorck-ignore-long-diag, r=brson
% ./x86_64-apple-darwin/stage2/bin/rustc --version
rustc 1.3.0-dev (680053848 2015-07-13)
% time ./x86_64-apple-darwin/stage2/bin/rustc /tmp/driver_run.rs -o driver_run 

real    0m14.264s
user    0m14.073s
sys 0m0.187s
% time ./driver_run 

real    0m0.279s
user    0m0.262s
sys 0m0.003s
% cd ../../rust-rw-match/objdir-opt/
% git log -1 --oneline
54b4a0f Fix lint.
% ./x86_64-apple-darwin/stage2/bin/rustc --version
rustc 1.3.0-dev (54b4a0f5b 2015-07-16)
% time ./x86_64-apple-darwin/stage2/bin/rustc /tmp/driver_run.rs -o driver_run 

real    0m13.745s
user    0m13.523s
sys 0m0.219s
% time ./driver_run 

real    0m9.235s
user    0m9.219s
sys 0m0.006s
%

(Above shell transcript was gathered on an unloaded machine.)

Now, I know this is a totally artificial micro-benchmark, but

I think the benchmark remains somewhat demonstrative of real-world dispatch code, and
this is one of those cases that motivates using a micro-benchmark -- to try to tease out performance cliffs like the one shown here.

eefriedman · 2015-08-27T22:04:49Z

Hmm... addressing that particular micro-benchmark requires generating a binary search tree or jump table (probably using an LLVM switch instruction). Conceptually, it's pretty simple: basically, it's just a matter of splitting the match statement based on integer fields. In practice, there are a lot of details... especially if we want a decent heuristic to decide which field to split on first. Not sure it's worthwhile, given that this code is going to get killed by MIR.

ranma42 · 2015-09-02T17:28:16Z

cc @ranma42

aturon · 2015-09-04T16:04:50Z

ping @rust-lang/compiler, would be great to give @eefriedman a definitive answer here.

nikomatsakis · 2015-09-04T16:31:33Z

@aturon yes. @eefriedman and I were discussing over e-mail and I think we reached the conclusion it made more sense to wait for the MIR to land.

nikomatsakis · 2015-10-01T20:26:55Z

Closing in favor of MIR transition, as discussed.

rust-highfive assigned brson Jul 15, 2015

tamird reviewed Jul 15, 2015
View reviewed changes

Fix review comments for rewrite of pattern matching in trans.

ea1fa9a

rust-highfive assigned pnkfelix and unassigned brson Jul 16, 2015

tamird reviewed Jul 16, 2015
View reviewed changes

Fix lint.

54b4a0f

paulrouget mentioned this pull request Aug 10, 2015

Fix OpenGL profile selection rust-windowing/glutin#567

Merged

nikomatsakis closed this Oct 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the pattern matching code in trans. #27050

Rewrite the pattern matching code in trans. #27050

eefriedman commented Jul 15, 2015

rust-highfive commented Jul 15, 2015

tamird Jul 15, 2015

arielb1 commented Jul 15, 2015

petrochenkov commented Jul 15, 2015

eefriedman commented Jul 15, 2015

brson commented Jul 16, 2015

brson commented Jul 16, 2015

tamird Jul 16, 2015

brson commented Jul 16, 2015

eefriedman commented Jul 16, 2015

eddyb commented Jul 16, 2015

arielb1 commented Jul 16, 2015

pnkfelix commented Jul 16, 2015

eddyb commented Jul 17, 2015

Aatch commented Jul 17, 2015

dotdash commented Jul 17, 2015

nikomatsakis commented Jul 17, 2015

eefriedman commented Jul 21, 2015

bors commented Jul 22, 2015

pnkfelix commented Jul 28, 2015

Manishearth commented Aug 10, 2015

pnkfelix commented Aug 24, 2015

eefriedman commented Aug 27, 2015

ranma42 commented Sep 2, 2015

aturon commented Sep 4, 2015

nikomatsakis commented Sep 4, 2015

nikomatsakis commented Oct 1, 2015

Rewrite the pattern matching code in trans. #27050

Rewrite the pattern matching code in trans. #27050

Conversation

eefriedman commented Jul 15, 2015

rust-highfive commented Jul 15, 2015

tamird Jul 15, 2015

Choose a reason for hiding this comment

arielb1 commented Jul 15, 2015

petrochenkov commented Jul 15, 2015

eefriedman commented Jul 15, 2015

brson commented Jul 16, 2015

brson commented Jul 16, 2015

tamird Jul 16, 2015

Choose a reason for hiding this comment

brson commented Jul 16, 2015

eefriedman commented Jul 16, 2015

eddyb commented Jul 16, 2015

arielb1 commented Jul 16, 2015

pnkfelix commented Jul 16, 2015

eddyb commented Jul 17, 2015

Aatch commented Jul 17, 2015

dotdash commented Jul 17, 2015

nikomatsakis commented Jul 17, 2015

eefriedman commented Jul 21, 2015

bors commented Jul 22, 2015

pnkfelix commented Jul 28, 2015

Manishearth commented Aug 10, 2015

pnkfelix commented Aug 24, 2015

rustc compiling libsyntax data

rustc compiling many_small_arms data

many_small_arms execution time data

eefriedman commented Aug 27, 2015

ranma42 commented Sep 2, 2015

aturon commented Sep 4, 2015

nikomatsakis commented Sep 4, 2015

nikomatsakis commented Oct 1, 2015