Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark: http/simple is unreliable #8139

Closed
Labels
benchmark Issues and PRs related to the benchmark subsystem. http Issues or PRs related to the http subsystem.

Comments

@bnoordhuis
Copy link
Member

I was working on some HTTP parser improvements and I noticed the numbers from the http/simple benchmark fluctuate wildly, even when the system is otherwise quiescent. This is with master, without my changes applied.

Results from 10 runs. Columns are mean(x) median(x) std(x) std(x)/median(x)*100.

http/simple.js c=50 chunks=0 length=4 type="bytes":         6041.3  6201.2  625.4  10.1%
http/simple.js c=500 chunks=0 length=4 type="bytes":        4706.7  4731.9  264.0   5.6%
http/simple.js c=50 chunks=1 length=4 type="bytes":         5683.0  5600.1  289.0   5.2%
http/simple.js c=500 chunks=1 length=4 type="bytes":        4435.2  4444.8  164.2   3.7%
http/simple.js c=50 chunks=4 length=4 type="bytes":         1157.7  1161.3   13.1   1.1%
http/simple.js c=500 chunks=4 length=4 type="bytes":        2797.2  2835.9  145.1   5.1%
http/simple.js c=50 chunks=0 length=1024 type="bytes":      5950.9  6013.6  318.8   5.3%
http/simple.js c=500 chunks=0 length=1024 type="bytes":     4648.4  4601.7  346.3   7.5%
http/simple.js c=50 chunks=1 length=1024 type="bytes":      5494.6  5365.3  297.9   5.6%
http/simple.js c=500 chunks=1 length=1024 type="bytes":     4367.5  4403.7  375.8   8.5%
http/simple.js c=50 chunks=4 length=1024 type="bytes":      1159.9  1161.8    6.8   0.6%
http/simple.js c=500 chunks=4 length=1024 type="bytes":     2772.3  2744.5   90.9   3.3%
http/simple.js c=50 chunks=0 length=102400 type="bytes":    1587.4  1582.8   28.9   1.8%
http/simple.js c=500 chunks=0 length=102400 type="bytes":   1458.2  1450.0   36.9   2.5%
http/simple.js c=50 chunks=1 length=102400 type="bytes":     952.4   958.8   27.1   2.8%
http/simple.js c=500 chunks=1 length=102400 type="bytes":    864.7   877.1   43.3   4.9%
http/simple.js c=50 chunks=4 length=102400 type="bytes":    1560.2  1612.4  121.2   7.5%
http/simple.js c=500 chunks=4 length=102400 type="bytes":   1454.6  1444.4  102.7   7.1%
http/simple.js c=50 chunks=0 length=4 type="buffer":        5817.7  5801.9  193.1   3.3%
http/simple.js c=500 chunks=0 length=4 type="buffer":       4655.4  4726.7  191.8   4.1%
http/simple.js c=50 chunks=1 length=4 type="buffer":        5736.3  5918.6  298.0   5.0%
http/simple.js c=500 chunks=1 length=4 type="buffer":       4478.3  4400.9  394.2   9.0%
http/simple.js c=50 chunks=4 length=4 type="buffer":        5348.3  5386.7  288.5   5.4%
http/simple.js c=500 chunks=4 length=4 type="buffer":       4215.7  4174.4  254.0   6.1%
http/simple.js c=50 chunks=0 length=1024 type="buffer":     5725.0  5748.3  373.2   6.5%
http/simple.js c=500 chunks=0 length=1024 type="buffer":    4503.7  4430.6  417.6   9.4%
http/simple.js c=50 chunks=1 length=1024 type="buffer":     5584.4  5621.3  189.1   3.4%
http/simple.js c=500 chunks=1 length=1024 type="buffer":    4221.0  4170.0  368.7   8.8%
http/simple.js c=50 chunks=4 length=1024 type="buffer":     4897.6  4860.7  397.4   8.2%
http/simple.js c=500 chunks=4 length=1024 type="buffer":    4015.6  3935.0  343.4   8.7%
http/simple.js c=50 chunks=0 length=102400 type="buffer":   4451.8  4370.2  160.1   3.7%
http/simple.js c=500 chunks=0 length=102400 type="buffer":  3323.9  3356.2  209.9   6.3%
http/simple.js c=50 chunks=1 length=102400 type="buffer":   4168.1  4168.7  256.0   6.1%
http/simple.js c=500 chunks=1 length=102400 type="buffer":  3326.4  3310.8  267.8   8.1%
http/simple.js c=50 chunks=4 length=102400 type="buffer":   3930.1  3945.4  202.3   5.1%
http/simple.js c=500 chunks=4 length=102400 type="buffer":  3128.2  3143.2  139.6   4.4%

Everything with a standard deviation > 1-2% is so unreliable as to be worthless, IMO.

Can someone confirm whether they are seeing similar fluctuations?

cc @nodejs/benchmarking

@bnoordhuis bnoordhuis added http Issues or PRs related to the http subsystem. benchmark Issues and PRs related to the benchmark subsystem. labels Aug 17, 2016
@Trott
Copy link
Member

Trott commented Aug 17, 2016

/cc @AndreasMadsen

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Aug 17, 2016

You should upload the raw (.csv) data so we can take a look at it.

I got 1.655% (and 1.669% unbiased) on http/simple.js c=500 chunks=0 length=1024 type="buffer". I would love to run all of them, but that takes +4 hours with 30 samples. Maybe I can do that tomorrow.

   c chunks length   type     mean  std.dev  cv.biased cv.unbiased conf.int
 500      0   1024 buffer 12375.81 204.8298 0.01650232  0.01663984 76.48471

here is the raw data (collected with scatter.js) and the R script I used to analyse it: https://gist.github.com/AndreasMadsen/4e134b478edd082e929f080d7f217be2

note that I used std(x)/mean(x) as I couldn't find any theory on std(x)/median(x), but the results are very similar (1.650% biased estimator when using median).


I must admit I'm not very familiar with the coefficient of variation and had to read up on it. However remember that the coefficient of variation is invariant to the number of observations, which greatly affects the standard deviation of the sample mean.

I'm curious, do you have any source on the "> 1-2% is so unreliable to be worthless", or is it completely your own opinion?

Personally I have always used confidence intervals instead (though it not completely compatible). In this case I think the confidence interval is reasonable, though it is easier to comment on when one has done some actual changes.

@bnoordhuis
Copy link
Member Author

You should upload the raw (.csv) data so we can take a look at it.

I don't have it, I scraped the numbers from the screen output. Here they are if you want them:

(6129.01, 6273.31, 6081.69, 6606.24, 4287.8, 6481.24, 6126.81, 6326.17, 5757.36, 6343.68) # http/simple.js c=50 chunks=0 length=4 type="bytes"
(4825.36, 4892.82, 4745.18, 5210.71, 4141.06, 4740.05, 4723.65, 4705.13, 4632.6, 4450.33) # http/simple.js c=500 chunks=0 length=4 type="bytes"
(5387.29, 5432.56, 5581.44, 5951.68, 5284.43, 6043.86, 5618.86, 6103.43, 5969.04, 5457.41) # http/simple.js c=50 chunks=1 length=4 type="bytes"
(4345.28, 4619.93, 4577.87, 4433.45, 4219.12, 4456.08, 4700.55, 4528.67, 4214.6, 4256.32) # http/simple.js c=500 chunks=1 length=4 type="bytes"
(1131.76, 1162.73, 1170.54, 1155.69, 1163.2, 1170.81, 1157.91, 1159.79, 1169.75, 1135.27) # http/simple.js c=50 chunks=4 length=4 type="bytes"
(2541.06, 2796.41, 2702.26, 2919.21, 2553.31, 2926.16, 2821.81, 2876.22, 2850.04, 2985.13) # http/simple.js c=500 chunks=4 length=4 type="bytes"
(6082.11, 6247.37, 5885.72, 5948.79, 5321.72, 6403.99, 6078.45, 5628.47, 5651.41, 6261.07) # http/simple.js c=50 chunks=0 length=1024 type="bytes"
(4743.77, 4143.62, 5279.55, 4741.32, 4481.43, 4167.41, 4573.96, 4629.41, 5155.01, 4568.6) # http/simple.js c=500 chunks=0 length=1024 type="bytes"
(5106.41, 5566.92, 5262.0, 6164.03, 5726.01, 5374.95, 5310.74, 5311.06, 5355.57, 5767.94) # http/simple.js c=50 chunks=1 length=1024 type="bytes"
(4414.44, 3608.63, 4828.11, 4692.2, 4393.0, 4002.1, 4219.51, 4692.33, 4057.87, 4766.84) # http/simple.js c=500 chunks=1 length=1024 type="bytes"
(1164.83, 1154.38, 1148.38, 1155.99, 1167.51, 1166.54, 1150.04, 1167.37, 1161.98, 1161.54) # http/simple.js c=50 chunks=4 length=1024 type="bytes"
(3008.63, 2792.37, 2733.67, 2750.96, 2694.03, 2738.08, 2781.87, 2670.81, 2834.29, 2718.33) # http/simple.js c=500 chunks=4 length=1024 type="bytes"
(1603.17, 1540.93, 1594.18, 1641.75, 1573.88, 1569.05, 1575.24, 1590.35, 1625.87, 1559.17) # http/simple.js c=50 chunks=0 length=102400 type="bytes"
(1447.06, 1443.11, 1415.4, 1474.7, 1493.6, 1405.53, 1437.9, 1538.42, 1473.68, 1452.95) # http/simple.js c=500 chunks=0 length=102400 type="bytes"
(942.54, 880.88, 955.97, 964.18, 980.56, 974.08, 972.91, 953.37, 937.53, 961.64) # http/simple.js c=50 chunks=1 length=102400 type="bytes"
(862.65, 836.06, 895.56, 889.77, 872.32, 758.42, 839.46, 924.86, 881.89, 885.89) # http/simple.js c=500 chunks=1 length=102400 type="bytes"
(1464.56, 1260.95, 1648.63, 1652.4, 1640.25, 1567.12, 1474.97, 1634.81, 1589.97, 1668.75) # http/simple.js c=50 chunks=4 length=102400 type="bytes"
(1394.29, 1380.53, 1522.57, 1635.81, 1392.91, 1287.14, 1358.17, 1494.43, 1511.81, 1567.85) # http/simple.js c=500 chunks=4 length=102400 type="bytes"
(5731.38, 5465.33, 5866.3, 5828.25, 6202.44, 5775.79, 6062.04, 5762.29, 5827.93, 5655.72) # http/simple.js c=50 chunks=0 length=4 type="buffer"
(4430.37, 4247.96, 4499.54, 4767.27, 4689.36, 4642.78, 4764.07, 4820.84, 4822.76, 4869.27) # http/simple.js c=500 chunks=0 length=4 type="buffer"
(5145.11, 5924.89, 5986.92, 5912.23, 5506.47, 5318.37, 5949.75, 5999.83, 5642.12, 5977.34) # http/simple.js c=50 chunks=1 length=4 type="buffer"
(3883.17, 4350.6, 5428.7, 4719.65, 4687.47, 4163.63, 4394.01, 4407.87, 4223.54, 4523.87) # http/simple.js c=500 chunks=1 length=4 type="buffer"
(5025.94, 5146.72, 5525.48, 5664.2, 5617.77, 4786.47, 5290.52, 5482.93, 5229.53, 5712.98) # http/simple.js c=50 chunks=4 length=4 type="buffer"
(3958.14, 3845.48, 4587.39, 4394.0, 4013.58, 4294.32, 4373.46, 4594.54, 4041.38, 4054.52) # http/simple.js c=500 chunks=4 length=4 type="buffer"
(5694.81, 5034.5, 5892.08, 5537.23, 5548.67, 5930.73, 5928.66, 5375.86, 5801.77, 6505.35) # http/simple.js c=50 chunks=0 length=1024 type="buffer"
(4653.95, 4331.3, 4529.86, 3842.34, 5162.96, 4262.19, 4129.83, 4178.14, 4781.28, 5165.36) # http/simple.js c=500 chunks=0 length=1024 type="buffer"
(5377.8, 5404.16, 5598.9, 5266.68, 5889.52, 5673.61, 5794.47, 5470.61, 5643.74, 5724.31) # http/simple.js c=50 chunks=1 length=1024 type="buffer"
(4273.99, 3851.52, 4485.49, 3888.94, 5008.76, 3951.45, 4611.87, 3798.13, 4100.02, 4239.95) # http/simple.js c=500 chunks=1 length=1024 type="buffer"
(4445.57, 4769.13, 5392.51, 4952.31, 5239.15, 4765.72, 5367.59, 4270.06, 5311.94, 4462.06) # http/simple.js c=50 chunks=4 length=1024 type="buffer"
(3940.47, 3633.86, 4603.27, 3572.15, 4300.59, 3813.66, 3837.7, 3929.49, 3944.15, 4580.91) # http/simple.js c=500 chunks=4 length=1024 type="buffer"
(4422.92, 4584.11, 4689.41, 4339.86, 4774.42, 4323.61, 4298.2, 4359.04, 4381.4, 4345.08) # http/simple.js c=50 chunks=0 length=102400 type="buffer"
(3414.73, 3235.97, 3658.03, 3576.69, 3354.0, 2927.27, 3288.56, 3034.4, 3358.44, 3391.09) # http/simple.js c=500 chunks=0 length=102400 type="buffer"
(3743.97, 4326.18, 4421.93, 4421.81, 3840.19, 4099.92, 4555.21, 3934.28, 4118.16, 4219.32) # http/simple.js c=50 chunks=1 length=102400 type="buffer"
(2866.77, 3301.74, 3674.63, 3477.26, 3251.45, 3319.95, 3789.7, 3075.3, 3069.91, 3437.71) # http/simple.js c=500 chunks=1 length=102400 type="buffer"
(3986.23, 3926.79, 3964.04, 3871.27, 3534.66, 4061.72, 4307.7, 3728.58, 3813.63, 4106.0) # http/simple.js c=50 chunks=4 length=102400 type="buffer"
(2869.64, 3088.59, 3151.95, 2962.71, 3098.05, 3423.59, 3134.49, 3172.54, 3209.03, 3171.31) # http/simple.js c=500 chunks=4 length=102400 type="buffer"

note that I used std(x)/mean(x) as I couldn't find any theory on std(x)/median(x)

I didn't copy the std(x)/mean(x) column because it wouldn't fit in the GH comment. Apologies if that was confusing.

(I use std(x)/median(x) as a quick gauge for means that are distorted by big outliers. Like you observed, not an issue here.)

I'm curious, do you have any source on the "> 1-2% is so unreliable to be worthless", or is it completely your own opinion

Let me rephrase that as "so unreliable as to be worthless for me." :-)

Benchmarks with too much variance don't let me measure the impact of small performance improvements, which is what I'm trying to test here.

Removing the min and the max from each column helps even things out a little but there are still tests where the variance over 8 runs is >7% (with std(x)/mean(x)*100 this time. :-))

http/simple.js c=50 chunks=0 length=4 type="bytes":         6189.9  6201.2   206.3   3.3%
http/simple.js c=500 chunks=0 length=4 type="bytes":        4714.4  4731.9   123.7   2.6%
http/simple.js c=50 chunks=1 length=4 type="bytes":         5680.3  5600.1   249.9   4.4%
http/simple.js c=500 chunks=1 length=4 type="bytes":        4429.6  4444.8   137.0   3.1%
http/simple.js c=50 chunks=4 length=4 type="bytes":         1159.4  1161.3    10.3   0.9%
http/simple.js c=500 chunks=4 length=4 type="bytes":        2805.7  2835.9   116.8   4.2%
http/simple.js c=50 chunks=0 length=1024 type="bytes":      5972.9  6013.6   226.8   3.8%
http/simple.js c=500 chunks=0 length=1024 type="bytes":     4632.6  4601.7   260.7   5.6%
http/simple.js c=50 chunks=1 length=1024 type="bytes":      5459.4  5365.3   186.7   3.4%
http/simple.js c=500 chunks=1 length=1024 type="bytes":     4404.8  4403.7   276.9   6.3%
http/simple.js c=50 chunks=4 length=1024 type="bytes":      1160.3  1161.8     5.8   0.5%
http/simple.js c=500 chunks=4 length=1024 type="bytes":     2755.4  2744.5    42.2   1.5%
http/simple.js c=50 chunks=0 length=102400 type="bytes":    1586.4  1582.8    20.1   1.3%
http/simple.js c=500 chunks=0 length=102400 type="bytes":   1454.8  1450.0    23.2   1.6%
http/simple.js c=50 chunks=1 length=102400 type="bytes":     957.8   958.8    12.3   1.3%
http/simple.js c=500 chunks=1 length=102400 type="bytes":    870.5   877.1    21.2   2.4%
http/simple.js c=50 chunks=4 length=102400 type="bytes":    1584.1  1612.4    71.7   4.5%
http/simple.js c=500 chunks=4 length=102400 type="bytes":   1452.8  1444.4    74.6   5.1%
http/simple.js c=50 chunks=0 length=4 type="buffer":        5813.7  5801.9   112.2   1.9%
http/simple.js c=500 chunks=0 length=4 type="buffer":       4679.6  4726.7   137.5   2.9%
http/simple.js c=50 chunks=1 length=4 type="buffer":        5777.3  5918.6   238.7   4.1%
http/simple.js c=500 chunks=1 length=4 type="buffer":       4433.8  4400.9   187.3   4.2%
http/simple.js c=50 chunks=4 length=4 type="buffer":        5372.9  5386.7   217.7   4.1%
http/simple.js c=500 chunks=4 length=4 type="buffer":       4214.6  4174.4   213.5   5.1%
http/simple.js c=50 chunks=0 length=1024 type="buffer":     5713.7  5748.3   195.5   3.4%
http/simple.js c=500 chunks=0 length=1024 type="buffer":    4503.7  4430.6   329.5   7.3%
http/simple.js c=50 chunks=1 length=1024 type="buffer":     5585.9  5621.3   143.0   2.6%
http/simple.js c=500 chunks=1 length=1024 type="buffer":    4175.4  4170.0   260.6   6.2%
http/simple.js c=50 chunks=4 length=1024 type="buffer":     4914.2  4860.7   342.4   7.0%
http/simple.js c=500 chunks=4 length=1024 type="buffer":    3997.6  3935.0   281.6   7.0%
http/simple.js c=50 chunks=0 length=102400 type="buffer":   4430.7  4370.2   125.1   2.8%
http/simple.js c=500 chunks=0 length=102400 type="buffer":  3331.7  3356.2   146.2   4.4%
http/simple.js c=50 chunks=1 length=102400 type="buffer":   4172.7  4168.7   201.7   4.8%
http/simple.js c=500 chunks=1 length=102400 type="buffer":  3326.0  3310.8   190.9   5.7%
http/simple.js c=50 chunks=4 length=102400 type="buffer":   3932.3  3945.4   117.4   3.0%
http/simple.js c=500 chunks=4 length=102400 type="buffer":  3123.6  3143.2    71.3   2.3%

@AndreasMadsen
Copy link
Member

I don't have it, I scraped the numbers from the screen output. Here they are if you want them:

Great I will take a look later.

Benchmarks with too much variance don't let me measure the impact of small performance improvements, which is what I'm trying to test here.

That actually depends on how many observations you have. The standard deviation of the mean performance, which is what you are really interested in when comparing changes, is given my std(x)/sqrt(length(x)). You can't just depend on std(x). The benchmark/compare.js tool can help you with doing this correctly.

@bnoordhuis
Copy link
Member Author

That actually depends on how many observations you have.

What are you proposing? I'm willing to accept that if you collect 1,000 samples, drop the top and bottom 25%, then do statistics to the remainder, that you're able to make statistically significant observations - but the problem is that I don't have days to let the benchmark suite collect those 1,000 samples.

As it stands, the variance between individual runs is so big it's useless (to me!) If there is a way to fix that, great. If not, I'll just have to PR my http parser changes without benchmarks to back them up.

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Aug 18, 2016

Ran the full benchmark with 30 samples (raw data: https://gist.github.com/AndreasMadsen/c0dff8145a910984bb96fa422749743c), here are the results. They are not as bad as yours, though they are definitely not all within < 1-2%.

   c chunks length   type       mean     std.dev   cv.biased   conf.int
  50      0      4 buffer 13252.2033  664.553398 0.050204837 248.148317
  50      0      4  bytes 13374.6770  598.987907 0.045541701 223.665760
  50      0   1024 buffer 13309.9843  565.347329 0.042726621 211.104162
  50      0   1024  bytes 13814.3997  818.623235 0.060269884 305.678940
  50      0 102400 buffer  9610.0793   72.085732 0.007492303  26.917255
  50      0 102400  bytes  1366.4933   27.654264 0.020326097  10.326272
  50      1      4 buffer 13511.9517  572.322219 0.042077349 213.708629
  50      1      4  bytes 13807.0837  689.314631 0.050866448 257.394314
  50      1   1024 buffer 12197.7417 1577.403949 0.123198157 589.012315
  50      1   1024  bytes  8341.2113  135.015765 0.016210418  50.415715
  50      1 102400 buffer  9406.8817   37.782083 0.004015697  14.108062
  50      1 102400  bytes  1020.7883    9.884413 0.009688179   3.690901
  50      4      4 buffer  7730.5703   67.515521 0.008730024  25.210710
  50      4      4  bytes  7255.1727  453.338848 0.062153148 169.279508
  50      4   1024 buffer  7530.8717   95.512597 0.012648857  35.664990
  50      4   1024  bytes  4897.6093  741.505319 0.156418891 276.882637
  50      4 102400 buffer  7064.8917  104.419390 0.014763233  38.990841
  50      4 102400  bytes  2879.3420  110.777593 0.038813834  41.365033
 500      0      4 buffer 12476.6117  221.360215 0.017766801  82.657263
 500      0      4  bytes 12649.5213  320.645432 0.025413168 119.730972
 500      0   1024 buffer 12345.4273  274.286889 0.022194854 102.420407
 500      0   1024  bytes 12975.5437  366.864025 0.028301120 136.989278
 500      0 102400 buffer  9762.2910  138.765844 0.014196975  51.816018
 500      0 102400  bytes  1319.8650   17.811349 0.013547276   6.650867
 500      1      4 buffer 12054.9603  246.843280 0.020529437  92.172796
 500      1      4  bytes 12529.3080  256.667950 0.020342316  95.841388
 500      1   1024 buffer 11533.3390  324.604648 0.028113207 121.209367
 500      1   1024  bytes  9887.2040  245.630187 0.024925675  91.719819
 500      1 102400 buffer  9324.7627  121.878697 0.013063692  45.510254
 500      1 102400  bytes   936.3767    9.977759 0.010674597   3.725757
 500      4      4 buffer  8718.8757  119.657152 0.013712283  44.680715
 500      4      4  bytes  7792.1260  239.006229 0.030585163  89.246392
 500      4   1024 buffer  8608.7457   96.506534 0.011218722  36.036132
 500      4   1024  bytes  6772.9040  240.011972 0.035650390  89.621943
 500      4 102400 buffer  6610.1147   92.423620 0.013984329  34.511547
 500      4 102400  bytes  2454.3990   27.595643 0.011211338  10.304383

What are you proposing?

I'm not proposing anything. I'm saying there is more to it than just the standard deviation. In fact I agree that the benchmarks takes too long. However I don't agree that they should be neglected completely. If you made a particular change and think that improves some aspect of of the http/simple.js benchmark, then just benchmark that part. Yes other parts could be negatively affected, but if you don't have the time then I think it's a reasonable compromise.


I did the math and it turns out it is not too hard to calculate the approximate appropriate coefficient of variation, given the number of observations and the expected relative improvement.

latex-image-1

Here are some outputs of that (e.g. if you have 30 observations and expect an improvement of 1%, then you need cv < 0.019).

cv im = 1% im = 10%
n = 10 cv < 0.011 cv < 0.111
n = 30 cv < 0.019 cv < 0.193

@bnoordhuis
Copy link
Member Author

I'm saying there is more to it than just the standard deviation.

I don't disagree there, the numbers I posted were just to show that individual runs are too imprecise to be useful.

If you made a particular change and think that improves some aspect of of the http/simple.js benchmark, then just benchmark that part. Yes other parts could be negatively affected, but if you don't have the time then I think it's a reasonable compromise.

I'm trying to make across-the-board performance improvements so that doesn't work for me, unfortunately.

Thanks for running the benchmarks, it helps to know it's not just local to my system.

@jasnell
Copy link
Member

jasnell commented Aug 18, 2016

I've seen this variation also. Not entirely sure what to do about it tho as I haven't dug in enough.

@AndreasMadsen
Copy link
Member

I'm trying to make across-the-board performance improvements so that doesn't work for me, unfortunately.

Since you are just benchmarking using single benchmark and you are interested in the general performance difference you could use a linear regression instead of a standard t-test. This way you can test for significance using the observations for all the different parameters combined.

This should be (hard to derive this late) much more sensitive to performance improvements since you go from 2*n - 2 to 2*n*c - c - 2 (n: observations, c:configurations) degrees of freedom (much larger). It is more tricky to interpret because if some configurations have a positive impact and others a negative impact ,then it will cancel out and not show significance. In the standard t-test, the comparisons are done separately thus that isn't a problem.

I just did this in #8140 (comment), but instead of testing the effect of the http benchmarker you will test the node version.

@bnoordhuis
Copy link
Member Author

I'm not sure I follow. It's a little ambiguous what "different parameters" refers to.

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Aug 19, 2016

http/simple.js benchmarks over a set of parameters given by the input object to createBenchmark.

var bench = common.createBenchmark(main, {
  // unicode confuses ab on os x.
  type: ['bytes', 'buffer'],
  length: [4, 1024, 102400],
  chunks: [0, 1, 4],  // chunks=0 means 'no chunked encoding'.
  c: [50, 500]
});

If you only need the combined effect then you can do the statstics on all of the parameters simultaneously. This increases the number of observations greatly without extra cost, and as you know the number of observations is important as well.

@bnoordhuis
Copy link
Member Author

I'll close, I think this ran its course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment