Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cowboy's HTTP/2 over TCP performance oddity #9423

Open
essen opened this issue Feb 12, 2025 · 2 comments
Open

Cowboy's HTTP/2 over TCP performance oddity #9423

essen opened this issue Feb 12, 2025 · 2 comments
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM

Comments

@essen
Copy link
Contributor

essen commented Feb 12, 2025

Describe the bug
I have mentioned in #9355 that for some reason Cowboy's HTTP/2 over TCP is much slower than HTTP/2 over TLS. This ticket is meant to expand on that.

For the next version of Cowboy I have added a benchmark using Common Test. One of the benchmark tests is the upload of a 10GB file to Cowboy. If I use that benchmark against the current version of Cowboy, it takes more than 20 minutes to complete (I gave up). Meanwhile HTTP/2 over TLS takes about 3/4 minutes. Both TCP and TLS use the same configuration, including default buffer sizes in this scenario.

HTTP/1.1 does not suffer from this problem despite roughly the same amount of data being transferred.

On Cowboy master the optimisations have helped immensely so the problem is no longer visible. Still I would like to make sure that there isn't a problem in OTP that leads to this.

The perf flame graph capturing the first minute of the benchmark looks like this:

Image

This is probably not normal...

How to investigate this?

To Reproduce

$ git clone https://github.com/ninenines/cowboy
$ cd cowboy
$ git checkout 2.12.0
$ git checkout master -- test/http_perf_SUITE.erl
$ git checkout master -- test/cowboy_test.erl
$ git checkout master -- test/handlers/read_body_h.erl
$ make ct-http_perf t=h2c:plain_h_10G_post

To test TLS just change the last command:

$ make ct-http_perf t=h2:plain_h_10G_post

Perf can be enabled by applying the following patch:

diff --git a/Makefile b/Makefile
index 5e88acf..6d2e179 100644
--- a/Makefile
+++ b/Makefile
@@ -9,6 +9,7 @@ PROJECT_REGISTERED = cowboy_clock
 
 PLT_APPS = public_key ssl # ct_helper gun common_test inets
 CT_OPTS += -ct_hooks cowboy_ct_hook [] # -boot start_sasl
+CT_OPTS += +JPperf true +S 1
 
 # Dependencies.
 
diff --git a/test/http_perf_SUITE.erl b/test/http_perf_SUITE.erl
index 38a18a4..f772505 100644
--- a/test/http_perf_SUITE.erl
+++ b/test/http_perf_SUITE.erl
@@ -32,7 +32,7 @@ groups() ->
 init_per_suite(Config) ->
 	do_log("", []),
 	%% Optionally enable `perf` for the current node.
-%	spawn(fun() -> ct:pal(os:cmd("perf record -g -F 9999 -o /tmp/http_perf.data -p " ++ os:getpid() ++ " -- sleep 60")) end),
+	spawn(fun() -> ct:pal(os:cmd("perf record -g -F 9999 -o /tmp/http_perf.data -p " ++ os:getpid() ++ " -- sleep 60")) end),
 	Config.
 
 end_per_suite(_) ->

Expected behavior
Less time in libc 😆

Affected versions
OTP-27.1.2, OTP-27.2.2

Additional context
I will build a more recent version to confirm it still happens. Unfortunately Arch Linux has just split the Erlang package into many packages...

@essen essen added the bug Issue is reported as a bug label Feb 12, 2025
@essen
Copy link
Contributor Author

essen commented Feb 12, 2025

I let it run on OTP-27.2.2, it takes almost 24 minutes.

@IngelaAndin IngelaAndin added team:VM Assigned to OTP team VM team:PS Assigned to OTP team PS labels Feb 13, 2025
@essen
Copy link
Contributor Author

essen commented Feb 13, 2025

To further illustrate why I think there may be an issue, uploading a 10GB body in Cowboy 2.12 (with default buffer), per protocol:

  • HTTP/1 over TCP: 111s
  • HTTP/1 over TLS: 100s
  • HTTP/2 over TCP: 1430s
  • HTTP/2 over TLS: 214s

And in upcoming Cowboy 2.13 (with buffer set dynamically based on size of data received from the socket):

  • HTTP/1 over TCP: 9s (11x faster)
  • HTTP/1 over TLS: 13s (8x faster)
  • HTTP/2 over TCP: 25s (57x faster)
  • HTTP/2 over TLS: 28s (7.5x faster)

To trigger the issue, specific settings are needed (such as a large maximum HTTP/2 frame size) so there is clearly an inefficiency in the work that Cowboy performs (it does more binary appends). But I would not expect HTTP/2 over TCP to perform so much worse than HTTP/2 over TLS with the same settings (something in ssl likely prevents the issue). It is one order of magnitude worse than it should.

Edit: Forgot to add, there's similar order of magnitude differences in other benchmark scenarios, such as uploading 1MB bodies repeatedly. Basically if the data is large enough, HTTP/2 over TCP performance is abysmal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug team:PS Assigned to OTP team PS team:VM Assigned to OTP team VM
Projects
None yet
Development

No branches or pull requests

2 participants