Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: derive transport context from context.Background #2930

Merged
merged 2 commits into from
Aug 29, 2019
Merged

transport: derive transport context from context.Background #2930

merged 2 commits into from
Aug 29, 2019

Conversation

JNProtzman
Copy link
Contributor

Avoid lock contention by deriving the transport context from context.Background() (without wrapping in a Cancel/Deadline context). This will make canceling a child context not recurse into the parent.

#1918

@thelinuxfoundation
Copy link

Thank you for your pull request. Before we can look at your contribution, we need to ensure all contributors are covered by a Contributor License Agreement.

After the following items are addressed, please respond with a new comment here, and the automated system will re-verify.

Regards,
CLA GitHub bot

@JNProtzman
Copy link
Contributor Author

Hey David, sorry about that! I believe I've rebased properly now.

@dfawley dfawley assigned canguler and unassigned JNProtzman Aug 1, 2019
@dfawley dfawley added the Type: Performance Performance improvements (CPU, network, memory, etc) label Aug 1, 2019
@canguler canguler changed the title transport: derive transport context from context.Background (#1918) transport: derive transport context from context.Background Aug 1, 2019
@canguler
Copy link

canguler commented Aug 5, 2019

Hi @JNProtzman,

Code looks good. Have you run any benchmarks to validate the performance improvement gained from the removal of the lock contention? It would be nice to see some numbers before merging this.

@JNProtzman
Copy link
Contributor Author

JNProtzman commented Aug 5, 2019

Hey @canguler,

##### PRE CHANGE #####
for i in {100,500,1000}; do go run benchmain/main.go --kbps=0 --latency=0 --reqSizeBytes=1 --respSizeBytes=1 --bufconn --maxConcurrentCalls=$i --resultFile="testing/change/benchmark$i" --workloads=unary; done

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 848.2140µs	90_Latency: 1471.1670µs	99_Latency: 2284.7460µs	Avg_Latency: 996.6120µs	Bytes/op: 939744560	Allocs/op: 16928735	
Histogram (unit: µs)
Count: 100677  Min: 344.2  Max: 20078.0  Avg: 996.61
------------------------------------------------------------
[     344.179000,      344.180000)       1    0.0%    0.0%  
[     344.180000,      344.185465)       0    0.0%    0.0%  
[     344.185465,      344.220798)       0    0.0%    0.0%  
[     344.220798,      344.449232)       0    0.0%    0.0%  
[     344.449232,      345.926092)       0    0.0%    0.0%  
[     345.926092,      355.474223)       0    0.0%    0.0%  
[     355.474223,      417.204371)      16    0.0%    0.0%  
[     417.204371,      816.299363)   49345   49.0%   49.0%  #####
[     816.299363,     3396.510471)   51079   50.7%   99.8%  #####
[    3396.510471,    20077.976000)     235    0.2%  100.0%  
[   20077.976000,             inf)       1    0.0%  100.0%  
Number of requests:  100677	Request throughput:  805416 bit/s
Number of responses: 100677	Response throughput: 805416 bit/s
 
unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 4425.1160µs	90_Latency: 5611.5040µs	99_Latency: 7410.0300µs	Avg_Latency: 4824.4580µs	Bytes/op: 964698064	Allocs/op: 17358947	
Histogram (unit: µs)
Count: 103677  Min:  67.0  Max: 51797.5  Avg: 4824.46
------------------------------------------------------------
[      66.955000,       66.956000)       1    0.0%    0.0%  
[      66.956000,       66.962196)       0    0.0%    0.0%  
[      66.962196,       67.006780)       0    0.0%    0.0%  
[      67.006780,       67.327605)       0    0.0%    0.0%  
[      67.327605,       69.636217)       0    0.0%    0.0%  
[      69.636217,       86.248676)       1    0.0%    0.0%  
[      86.248676,      205.789682)       0    0.0%    0.0%  
[     205.789682,     1065.990598)       1    0.0%    0.0%  
[    1065.990598,     7255.880092)  102548   98.9%   98.9%  ##########
[    7255.880092,    51797.488000)    1125    1.1%  100.0%  
[   51797.488000,             inf)       1    0.0%  100.0%  
Number of requests:  103677	Request throughput:  829416 bit/s
Number of responses: 103677	Response throughput: 829416 bit/s
 
unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 9.7420ms	90_Latency: 11.2683ms	99_Latency: 26.6832ms	Avg_Latency: 10.2549ms	Bytes/op: 910214968	Allocs/op: 16324278	
Histogram (unit: ms)
Count: 97605  Min:   6.1  Max:  56.1  Avg: 10.25
------------------------------------------------------------
[       6.071163,        6.071164)      1    0.0%    0.0%  
[       6.071164,        6.071170)      0    0.0%    0.0%  
[       6.071170,        6.071214)      0    0.0%    0.0%  
[       6.071214,        6.071531)      0    0.0%    0.0%  
[       6.071531,        6.073804)      0    0.0%    0.0%  
[       6.073804,        6.090092)      2    0.0%    0.0%  
[       6.090092,        6.206857)     34    0.0%    0.0%  
[       6.206857,        7.043879)    400    0.4%    0.4%  
[       7.043879,       13.044054)  93956   96.3%   96.7%  ##########
[      13.044054,       56.056141)   3211    3.3%  100.0%  
[      56.056141,             inf)      1    0.0%  100.0%  
Number of requests:  97605	Request throughput:  780840 bit/s
Number of responses: 97605	Response throughput: 780840 bit/s
 
 
##### POST CHANGE #####
for i in {100,500,1000}; do go run benchmain/main.go --kbps=0 --latency=0 --reqSizeBytes=1 --respSizeBytes=1 --bufconn --maxConcurrentCalls=$i --resultFile="testing/change/benchmark$i" --workloads=unary; done
unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 803.5640µs	90_Latency: 1452.4320µs	99_Latency: 2063.8200µs	Avg_Latency: 984.9910µs	Bytes/op: 946735736	Allocs/op: 17054147	
Histogram (unit: µs)
Count: 101423  Min:  94.5  Max: 28068.9  Avg: 984.99
------------------------------------------------------------
[      94.482000,       94.483000)       1    0.0%    0.0%  
[      94.483000,       94.488721)       0    0.0%    0.0%  
[      94.488721,       94.527169)       0    0.0%    0.0%  
[      94.527169,       94.785567)       0    0.0%    0.0%  
[      94.785567,       96.522196)       0    0.0%    0.0%  
[      96.522196,      108.193652)       1    0.0%    0.0%  
[     108.193652,      186.634628)       1    0.0%    0.0%  
[     186.634628,      713.817079)   46745   46.1%   46.1%  #####
[     713.817079,     4256.880281)   54353   53.6%   99.7%  #####
[    4256.880281,    28068.934000)     321    0.3%  100.0%  
[   28068.934000,             inf)       1    0.0%  100.0%  
Number of requests:  101423	Request throughput:  811384 bit/s
Number of responses: 101423	Response throughput: 811384 bit/s
 
unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 4381.2980µs	90_Latency: 5655.5230µs	99_Latency: 8335.9260µs	Avg_Latency: 4793.8590µs	Bytes/op: 969602088	Allocs/op: 17448421	
Histogram (unit: µs)
Count: 104192  Min:  59.5  Max: 48832.4  Avg: 4793.86
------------------------------------------------------------
[      59.548000,       59.549000)       1    0.0%    0.0%  
[      59.549000,       59.555149)       0    0.0%    0.0%  
[      59.555149,       59.599107)       1    0.0%    0.0%  
[      59.599107,       59.913364)       0    0.0%    0.0%  
[      59.913364,       62.159969)       0    0.0%    0.0%  
[      62.159969,       78.220822)       1    0.0%    0.0%  
[      78.220822,      193.038990)       3    0.0%    0.0%  
[     193.038990,     1013.867823)       5    0.0%    0.0%  
[    1013.867823,     6881.928495)  101374   97.3%   97.3%  ##########
[    6881.928495,    48832.375000)    2806    2.7%  100.0%  
[   48832.375000,             inf)       1    0.0%  100.0%  
Number of requests:  104192	Request throughput:  833536 bit/s
Number of responses: 104192	Response throughput: 833536 bit/s
 
unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false:
50_Latency: 9565.1990µs	90_Latency: 11424.5890µs	99_Latency: 35439.5940µs	Avg_Latency: 10019.4360µs	Bytes/op: 928870448	Allocs/op: 16709670	
Histogram (unit: µs)
Count: 99799  Min:  65.6  Max: 59305.1  Avg: 10019.44
------------------------------------------------------------
[      65.602000,       65.603000)      1    0.0%    0.0%  
[      65.603000,       65.609305)      0    0.0%    0.0%  
[      65.609305,       65.655364)      0    0.0%    0.0%  
[      65.655364,       65.991826)      0    0.0%    0.0%  
[      65.991826,       68.449697)      0    0.0%    0.0%  
[      68.449697,       86.404582)      1    0.0%    0.0%  
[      86.404582,      217.565990)      0    0.0%    0.0%  
[     217.565990,     1175.707198)      0    0.0%    0.0%  
[    1175.707198,     8174.980743)  17311   17.3%   17.3%  ##
[    8174.980743,    59305.054000)  82485   82.7%  100.0%  ########
[   59305.054000,             inf)      1    0.0%  100.0%  
Number of requests:  99799	Request throughput:  798392 bit/s
Number of responses: 99799	Response throughput: 798392 bit/s

Looks to be nearly a 1% performance increase! If there is a better way to attach output in the future, please let me know :)

@easwars
Copy link
Contributor

easwars commented Aug 7, 2019

You could also use https://github.com/grpc/grpc-go/blob/master/benchmark/benchresult/main.go to compare the old and new numbers side-by-side.

@JNProtzman
Copy link
Contributor Author

Thanks for the info @easwars!
I've run the benchresults:

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps        90029        85295    -5.26%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op    823949640    781314640    -5.17%
           Allocs/op     14741698     13990716    -5.09%
             ReqT/op    720232.00    682360.00    -5.26%
            RespT/op    720232.00    682360.00    -5.26%
            50th-Lat     946.74µs    987.833µs     4.34%
            90th-Lat   1.693296ms   1.851999ms     9.37%
            99th-Lat   3.568797ms   4.035379ms    13.07%
             Avg-Lat   1.109566ms   1.171052ms     5.54%

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps        98561        91458    -7.21%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op    899575248    835340184    -7.14%
           Allocs/op     16018321     14919730    -6.86%
             ReqT/op    788488.00    731664.00    -7.21%
            RespT/op    788488.00    731664.00    -7.21%
            50th-Lat   4.504672ms   4.936055ms     9.58%
            90th-Lat   6.490785ms   7.334942ms    13.01%
            99th-Lat  14.366643ms  12.878601ms   -10.36%
             Avg-Lat   5.064741ms   5.458381ms     7.77%

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps        93829        92398    -1.53%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op    856896304    843433216    -1.57%
           Allocs/op     15235105     15061161    -1.14%
             ReqT/op    750632.00    739184.00    -1.53%
            RespT/op    750632.00    739184.00    -1.53%
            50th-Lat  10.168265ms  10.258331ms     0.89%
            90th-Lat  12.187334ms  13.404618ms     9.99%
            99th-Lat  23.039939ms  22.496723ms    -2.36%
             Avg-Lat  10.635387ms  10.810374ms     1.65%

@easwars
Copy link
Contributor

easwars commented Aug 7, 2019

It looks like you have used a value for 1s for benchTime. Might want to run it with something larger like 30s or 60s to get a better idea.

@JNProtzman
Copy link
Contributor Author

I've set benchTime to 60s and re-run :)

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps      8477796      8699257     2.61%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  77093414472  79099367856     2.60%
           Allocs/op   1378113070   1414443658     2.64%
             ReqT/op   1130372.80   1159900.93     2.61%
            RespT/op   1130372.80   1159900.93     2.61%
            50th-Lat    639.434µs    627.784µs    -1.82%
            90th-Lat    791.373µs    767.274µs    -3.05%
            99th-Lat   1.893918ms   1.796654ms    -5.14%
             Avg-Lat    706.703µs    688.813µs    -2.53%

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps      8489759      8388864    -1.19%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  77093663608  76130607048    -1.25%
           Allocs/op   1378130657   1363896739    -1.03%
             ReqT/op   1131967.87   1118515.20    -1.19%
            RespT/op   1131967.87   1118515.20    -1.19%
            50th-Lat   3.218457ms   3.272608ms     1.68%
            90th-Lat   4.446641ms   4.531206ms     1.90%
            99th-Lat   6.527856ms   6.265508ms    -4.02%
             Avg-Lat   3.532267ms   3.574924ms     1.21%

unary-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps      8342467      8276324    -0.79%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  75681263648  75110616824    -0.75%
           Allocs/op   1354295489   1345355317    -0.66%
             ReqT/op   1112328.93   1103509.87    -0.79%
            RespT/op   1112328.93   1103509.87    -0.79%
            50th-Lat   6.539559ms   6.578258ms     0.59%
            90th-Lat   8.707139ms   8.823104ms     1.33%
            99th-Lat   11.74404ms  11.526839ms    -1.85%
             Avg-Lat   7.189053ms   7.248159ms     0.82%

@canguler
Copy link

canguler commented Aug 8, 2019

@JNProtzman

Can you run benchmarks in streaming and unconstrained streaming modes as well?

Thanks!
Can

@JNProtzman
Copy link
Contributor Author

Here's streaming:

streaming-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps     26001020     26140444     0.54%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  23916530944  24093603792     0.74%
           Allocs/op    821718579    830319651     1.05%
             ReqT/op   3466802.67   3485392.53     0.54%
            RespT/op   3466802.67   3485392.53     0.54%
            50th-Lat     212.76µs    208.859µs    -1.83%
            90th-Lat    321.078µs    324.593µs     1.09%
            99th-Lat    653.274µs    669.583µs     2.50%
             Avg-Lat    229.338µs    228.014µs    -0.58%

streaming-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps     35809709     37047200     3.46%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  32793364872  34065557560     3.88%
           Allocs/op   1133991935   1189481781     4.89%
             ReqT/op   4774627.87   4939626.67     3.46%
            RespT/op   4774627.87   4939626.67     3.46%
            50th-Lat    684.439µs    690.842µs     0.94%
            90th-Lat   1.139844ms   1.021263ms   -10.40%
            99th-Lat   3.287072ms   3.025826ms    -7.95%
             Avg-Lat    833.572µs    804.568µs    -3.48%

streaming-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps     35895444     35706285    -0.53%
             SendOps            0            0      NaN%
             RecvOps            0            0      NaN%
            Bytes/op  32676963048  32667526376    -0.03%
           Allocs/op   1118930809   1124681457     0.51%
             ReqT/op   4786059.20   4760838.00    -0.53%
            RespT/op   4786059.20   4760838.00    -0.53%
            50th-Lat    1.40251ms   1.422314ms     1.41%
            90th-Lat   2.236248ms    2.25349ms     0.77%
            99th-Lat   5.959716ms    5.99992ms     0.67%
             Avg-Lat   1.660106ms    1.66727ms     0.43%

and unconstrained:

unconstrained-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps            0            0      NaN%
             SendOps     39285364     38262931    -2.60%
             RecvOps     38775100     39346950     1.47%
            Bytes/op          761          758    -0.39%
           Allocs/op           25           25     0.00%
             ReqT/op   5238048.53   5101724.13    -2.60%
            RespT/op   5170013.33   5246260.00     1.47%
            50th-Lat           0s           0s      NaN%
            90th-Lat           0s           0s      NaN%
            99th-Lat           0s           0s      NaN%
             Avg-Lat           0s           0s      NaN%

unconstrained-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_500-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps            0            0      NaN%
             SendOps     29669592     28863348    -2.72%
             RecvOps     27443688     27184874    -0.94%
            Bytes/op          743          742    -0.13%
           Allocs/op           26           26     0.00%
             ReqT/op   3955945.60   3848446.40    -2.72%
            RespT/op   3659158.40   3624649.87    -0.94%
            50th-Lat           0s           0s      NaN%
            90th-Lat           0s           0s      NaN%
            99th-Lat           0s           0s      NaN%
             Avg-Lat           0s           0s      NaN%

unconstrained-networkMode_none-bufConn_true-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1000-reqSize_1B-respSize_1B-compressor_off-channelz_false-preloader_false
               Title       Before        After Percentage
            TotalOps            0            0      NaN%
             SendOps     21766622     20353039    -6.49%
             RecvOps     16523434     15456795    -6.46%
            Bytes/op          765          769     0.52%
           Allocs/op           27           27     0.00%
             ReqT/op   2902216.27   2713738.53    -6.49%
            RespT/op   2203124.53   2060906.00    -6.46%
            50th-Lat           0s           0s      NaN%
            90th-Lat           0s           0s      NaN%
            99th-Lat           0s           0s      NaN%
             Avg-Lat           0s           0s      NaN%

@canguler
Copy link

Hi @JNProtzman,

Can you run tests multiple times, say at least 10 each? Because the results are not very clear and I think this may be due to the variance from run to run.

@menghanl
Copy link
Contributor

Friendly ping

@JNProtzman
Copy link
Contributor Author

Hey @canguler and @menghanl,

My apologies for the delay. I've run the requested tests, 10x each. I've attached a zip archive with the output (in .txt files). Please let me know if this works, or if you need further information / test runs.

resultFiles.zip

internal/transport/http2_server.go Outdated Show resolved Hide resolved
@linux-foundation-easycla
Copy link

CLA Check
The committers are authorized under a signed CLA.

@JNProtzman
Copy link
Contributor Author

I ran the benchmark on the current results without sync.Once and the previous results before my 2 commits. Here's the result comparisons:
resultFiles.zip

@dfawley dfawley added this to the 1.24 Release milestone Aug 29, 2019
@dfawley dfawley merged commit 4d39b48 into grpc:master Aug 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Feb 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants