Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more benchmarks #447

Merged
merged 7 commits into from
Dec 21, 2017
Merged

add more benchmarks #447

merged 7 commits into from
Dec 21, 2017

Conversation

lahma
Copy link
Collaborator

@lahma lahma commented Dec 18, 2017

This already shows that some of array usage problems might be performance related, I'm going to investigate if there are any easy wins.

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.125)
Processor=Intel Core i7-6820HQ CPU 2.70GHz (Skylake), ProcessorCount=8
Frequency=2648437 Hz, Resolution=377.5812 ns, Timer=TSC
.NET Core SDK=2.1.2
  [Host]     : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
  Job-POIOJA : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

InvocationCount=4  LaunchCount=1  TargetCount=3  
UnrollFactor=4  WarmupCount=3  
Method N ReuseEngine Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Jint 5 False 11,118.2 ms 1,832.937 ms 103.5643 ms 2205666.6667 42666.6667 11333.3333 8922.57 MB
Jurassic 5 False 626.2 ms 293.042 ms 16.5574 ms 25750.0000 5000.0000 2250.0000 126.63 MB
NilJS 5 False 521.8 ms 51.011 ms 2.8822 ms 15500.0000 5000.0000 2250.0000 86.15 MB
Jint 5 True 10,877.9 ms 563.051 ms 31.8134 ms 2207083.3333 44833.3333 10500.0000 8921.86 MB
Jurassic 5 True 572.2 ms 4.661 ms 0.2633 ms 25000.0000 5000.0000 1500.0000 123.85 MB
NilJS 5 True 504.1 ms 20.257 ms 1.1445 ms 15750.0000 5250.0000 2500.0000 86.14 MB

@sebastienros
Copy link
Owner

There is an updated version of the article here https://rushfrisby.com/net-javascript-engine-performance-results-updated-2016/

And it looks like the author is using Jint even though it's the slowest on these scripts. It's not unexpected as these are compute intensive, and compiled scripts will shine, so V8 will always win there. But that a good set of benchmarks to find the big bottlenecks. Obviously Array is one of the main issues, which is why you will find a branch here were I tried to optimized things, without much success.

If you want to improve it, the main idea would be to have different implementations of the prototype methods (push, iterate, sort, ...) based on the type of array, like sparse, range, ... This is what v8 does too. The idea is that the specification states an array can be sparse and it uses string indices. But it makes the algorithm and implementations slow. So to optimize it we can store some flags and detect best cases, then not use the exact specification but optimized algorithms. I think V8 supports three different types.

@sebastienros
Copy link
Owner

A few comments:

  • Seems weird that the allocated memory is the same whether the engine is reused or not.
  • I understand it's more work, but I think it would be much more valuable to see more granular modifications to understand how each change impact the perf. There are some changes I'd prefer not to make if they don't really provide much value. Or just to understand what as much impact to be able to reproduce it in other parts of the code.
  • Last but not the least, thanks a lot !

@lahma
Copy link
Collaborator Author

lahma commented Dec 18, 2017

Thank you for reviewing. In this pull request I'm just trying put some baseline numbers available. I did see the updated benchmark which brings the sunspider etc tests in, but thought this would be easier as first step.

As the sunspider tests seem to be part of tests already they should not be hard to reference.

What I understood I should split the performance pull requests (esprima, jint array performance) to smaller parts which I fully understand, but is this PR good as-is or should I split it to smaller parts?

I probably need to check the reused engine. It might not affect as loop is now N=1 and engine should be probably a static variable in this case.

@lahma
Copy link
Collaborator Author

lahma commented Dec 18, 2017

I've updated the the benchmark and top comment with run information done with 5 repetitions per new engine instance / shared engine. The array handling at the moment is a bit slow and thus takes some time to get results.

With friendlier benchmark (not targeting array handling) we can use larger N and see a bigger difference by using the same instance.

@ayende
Copy link
Contributor

ayende commented Dec 18, 2017

Just some words about these kind of benchmarks. We did a whole bunch of work to see what it would look like with Jurassic on our end. The kind of benchmark you are seeing here is misleading, because you are doing a LOT of work inside the js engine. If you are using it to mostly do things outside, such as directing the operation of other code, than the cost of going in and out of the engine can kill your perf.

See: https://ayende.com/blog/179553/with-performance-test-benchmark-and-be-ready-to-back-out
And: https://ayende.com/blog/179617/js-execution-performance-and-a-whole-lot-of-effort

@sebastienros
Copy link
Owner

I am ok with tracking perf comparisons with Jurassic and NilJS, at least to know the differences, as long as we can see different scripts, like some sunspiders but also small ones like the one which was used initially. Can also show the scenarios that @ayende is referring to. This should be one benchmark table.

Then we can have other specific benchmarks that don't show Jurassic or NiJS which would only be used to track improvements and regressions. Arrays being one of them. In this case we really don't care about the reuse flag.

But before doing any change I want to hear what everyone has to say. You two have the most weight on the matter as you are directly impacted.

@lahma
Copy link
Collaborator Author

lahma commented Dec 18, 2017

I had the different engines there as the original benchmark project had the Jurassic one and the per comparison seemed like a way to see "how far are we". I do understand the fundamental differences between engines.

For me this more of fun optimization exercise that hopefully could benefit both Jint and RavenDB. My journey began with some JSON projection trouble that seemed to use a lot of memory. I'm open to drop comparison between engines or change to method level benchmarks. I just need some torture baseline like the array case which is worst case scenario, but showed nicely an area to improve on and a number for PR to reflect against.

@lahma
Copy link
Collaborator Author

lahma commented Dec 20, 2017

I've now put the other engine benchmark behind conditional compilation and they won't be present by default. I've also added the sun spider benchmarks as separate benchmark that gives results for each file, see results below. Is there something more to do/change before this could be merged?

Method FileName Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Run 3d-cube 1,270.4 ms 693.672 ms 39.1937 ms 210000.0000 250.0000 - 842.35 MB
Run 3d-morph 1,331.6 ms 118.247 ms 6.6812 ms 191333.3333 51833.3333 6750.0000 758.14 MB
Run 3d-raytrace 1,054.4 ms 58.666 ms 3.3148 ms 203500.0000 2250.0000 750.0000 818.78 MB
Run access-binary-trees 469.8 ms 123.501 ms 6.9780 ms 92750.0000 1000.0000 - 373.94 MB
Run access-fannkuch 3,690.9 ms 797.337 ms 45.0510 ms 794583.3333 250.0000 - 3178.96 MB
Run access-nbody 1,203.7 ms 387.911 ms 21.9177 ms 179000.0000 - - 716.74 MB
Run access-nsieve 1,932.4 ms 1,011.797 ms 57.1684 ms 256583.3333 74583.3333 9666.6667 1058.6 MB
Run bitops-3bit-bits-in-byte 928.6 ms 421.755 ms 23.8300 ms 167500.0000 - - 670.44 MB
Run bitops-bits-in-byte 1,393.3 ms 292.475 ms 16.5254 ms 240500.0000 - - 962.87 MB
Run bitops-bitwise-and 883.0 ms 363.535 ms 20.5404 ms 101750.0000 - - 407.43 MB
Run bitops-nsieve-bits 1,717.5 ms 1,442.951 ms 81.5294 ms 296000.0000 31750.0000 1750.0000 1188 MB
Run controlflow-recursive 790.6 ms 39.963 ms 2.2580 ms 147750.0000 2750.0000 - 599.25 MB
Run crypto-aes 1,324.2 ms 3,698.428 ms 208.9680 ms 259750.0000 250.0000 - 1042.38 MB
Run crypto-md5 674.4 ms 311.339 ms 17.5912 ms 123000.0000 2500.0000 500.0000 494.57 MB
Run crypto-sha1 694.7 ms 25.975 ms 1.4676 ms 126250.0000 1250.0000 250.0000 506.9 MB
Run date-format-tofte 867.6 ms 1,393.672 ms 78.7450 ms 153000.0000 250.0000 - 614.79 MB
Run date-format-xparb 575.9 ms 364.477 ms 20.5936 ms 51000.0000 250.0000 - 205.97 MB
Run math-cordic 1,840.5 ms 957.835 ms 54.1195 ms 297750.0000 - - 1191.14 MB
Run math-partial-sums 581.2 ms 103.820 ms 5.8660 ms 73000.0000 - - 292.71 MB
Run math-spectral-norm 810.1 ms 77.500 ms 4.3789 ms 144000.0000 - - 576.42 MB
Run regexp-dna 339.1 ms 4.092 ms 0.2312 ms 2500.0000 2000.0000 1500.0000 21.48 MB
Run string-base64 822.8 ms 150.389 ms 8.4972 ms 350000.0000 250.0000 - 1403.56 MB
Run string-fasta 1,071.1 ms 41.007 ms 2.3169 ms 204750.0000 - - 819.47 MB
Run string-tagcloud 875.0 ms 210.431 ms 11.8898 ms 199666.6667 123500.0000 116583.3333 1080.07 MB
Run string-unpack-code 333.0 ms 3.888 ms 0.2197 ms 61500.0000 4000.0000 1250.0000 261.57 MB
Run string-validate-input 2,893.0 ms 839.161 ms 47.4142 ms 1645583.3333 1562333.3333 1561083.3333 6460.62 MB

@lahma
Copy link
Collaborator Author

lahma commented Dec 20, 2017

  • I've removed the permutation where engine is not being reused, always using the same engine to reflect sane real world usage
  • I've added benchmark for the projection case that led me to this endeavour

@sebastienros
Copy link
Owner

Currently reviewing it. I might refactor it to my taste if you don't mind, I don't want to ask you to do more changes and get you frustrated, you already did a lot ;) Only details, don't worry.

}

[Params(500)]
public int N { get; set; }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Why is it necessary when Benchmarks.NET can already do that by itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Params adds it to benchmark report and clarifies the benchmark case's scenario. Benchmark.NET runs the target method X times which it finds best to get good confidence intervals etc. So here I want to ad to report "The target Jint function was called 500 times inside the test case". We could also add more numbers that would show how it behaves depending on iteration count (if it would be costly to call once due to caches but really fast after that).

@lahma
Copy link
Collaborator Author

lahma commented Dec 20, 2017

@sebastienros please do any necessary changes, I'm more than OK with that, thanks.


private static void InitializeEngine(Options options)
{
options
Copy link
Owner

@sebastienros sebastienros Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these options?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore, didn't see it was specific to this benchmark

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and ones that RavenDB uses, MaxStatements especially trips here to error when the smaller default that RavenDB uses.. something like 50 in there and far from enough.

/// Test case for situation where object is projected via filter and map, Jint deems code as uncacheable.
/// </summary>
[Config(typeof(Config))]
public class UncacheableExpressionsBenchmark
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a special case benchmark. If there is a bug (like something should be cached and is not) we should have a unit test. If this is just to make a specific case fast enough, then it shouldn't be in the repos, or at least on a custom branch where the work for improvement will happen, then be removed once it's done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way a special case, but this should reflect function invocation with arrays in quite generic way. Arrays struggle with these kinds of data where index access/traversal goes via dictionary instead of pure array indexing and produces more arrays.

There isn't a bug per se, but a performance problem when you try to filter/project large nested arrays. But feel free to remove if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map and filter are very common things to do on arrays.
And they can usually be lifted quite easily, that is not just a generic scenario, that is something that I think should be optimized specifically

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then these benchmarks should be in the same table as the sunspider ones, unless there are already included there, or within one dedicated to Array operations.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an Array benchmark class that would look like this http://jsben.ch/k3QoV. It doesn't have to be in this PR. We can do it later or as part of a perf PR that would focus on arrays.

@lahma
Copy link
Collaborator Author

lahma commented Dec 21, 2017

I've renamed the old ArrayBenchmark to ArrayStressBenchmark and created new ArrayBenchmark for the latest link that you gave. I included the the operations that are supported.

There might be problems combining benchmark suites and I personally don't follow the same mindset that you would with unit/integration tests. Ideally benchmark case would have two methods: old algorithm and new algorithm which would create us the tables etc nicely. This however, is not easy to achieve when you are trying to make larger changes/refactorings - then I usually run the suite in "master" (or base branch containing only new benchmark against master), copy result directory, checkout to my new branch and run same test and then create the comparison results by showing the before and after reports.

It's also problematic when you have too many tests in benchmark, it becomes slow to run when you just want to optimize some very specific thing (maybe one test method shows that one). Unlike tests and the actual shipped code, benchmarks are more in supporting role and might not be reused that much depending on the case.

I hope we find some middle ground soon, I'd need the two optimizations to advance other analysis, it's a bit painful to cherry-pick between branches and trying to mentally keep up what's unfixed problem and what is fixed somewhere else.

@sebastienros sebastienros merged commit 4a7fed1 into sebastienros:dev Dec 21, 2017
@lahma
Copy link
Collaborator Author

lahma commented Dec 21, 2017

Thank you for handling the PR fast , my eagerness to improve may show up as impatience but I hope I can help the best I can.

@lahma
Copy link
Collaborator Author

lahma commented Dec 22, 2017

I've created a new issue #451 to track overall progress, I'll update results there between pull requests.

@lahma lahma deleted the more-benchmarks branch January 5, 2019 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants