Skip to content

benchmark: more accurate metrics and profiles#5580

Merged
arkodg merged 5 commits intoenvoyproxy:mainfrom
shawnh2:benchmark-acc-metrics
Mar 25, 2025
Merged

benchmark: more accurate metrics and profiles#5580
arkodg merged 5 commits intoenvoyproxy:mainfrom
shawnh2:benchmark-acc-metrics

Conversation

@shawnh2
Copy link
Contributor

@shawnh2 shawnh2 commented Mar 22, 2025

What type of PR is this?

What this PR does / why we need it:

We used to collect metrics and profiles at the end of one benchmark test, the result may not be accurate, becasue the metrics like

  • process_resident_memory_bytes and container_memory_working_set_bytes are instant value, they only reflect the status at the time of our promql request, same as pprof profiles.
  • process_cpu_seconds_total and container_cpu_usage_seconds_total are cumulative value, we should use rate() to get its value during a time range

So this PR samples metrics and profiles at test runtime, and report the min/max/mean of mem and cpu during the test run (ditto for pprof profiles), in order to make the benchmark result more accurate and trustworthy.

Which issue(s) this PR fixes:

ref: #4516, #4498

Release Notes: No

@shawnh2 shawnh2 requested a review from a team as a code owner March 22, 2025 03:03
@codecov
Copy link

codecov bot commented Mar 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.25%. Comparing base (7928b61) to head (8e76c30).
Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5580      +/-   ##
==========================================
+ Coverage   65.23%   65.25%   +0.02%     
==========================================
  Files         213      213              
  Lines       34069    34073       +4     
==========================================
+ Hits        22224    22235      +11     
+ Misses      10508    10504       -4     
+ Partials     1337     1334       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shawnh2 shawnh2 force-pushed the benchmark-acc-metrics branch from 5930afd to de62195 Compare March 22, 2025 07:27
@shawnh2
Copy link
Contributor Author

shawnh2 commented Mar 22, 2025

Before:

Test Name Envoy Gateway Memory (MiB) Envoy Gateway CPU (s) Envoy Proxy Memory (Avg) (MiB) Envoy Proxy CPU (Avg) (s)
scaling up httproutes to 10 with 2 routes per hostname 124.21 0.95 25.9 30.41
scaling up httproutes to 50 with 10 routes per hostname 154.47 1.77 32.03 61.05
scaling up httproutes to 100 with 20 routes per hostname 144.11 3.15 38.09 91.85
scaling up httproutes to 300 with 60 routes per hostname 162.7 15.37 56.15 124.45
scaling up httproutes to 500 with 100 routes per hostname 175.8 27.81 74.18 157.52
scaling up httproutes to 1000 with 200 routes per hostname 208.31 61.05 124.28 196.62
scaling down httproutes to 500 with 100 routes per hostname 168.97 90.49 124.29 232.51
scaling down httproutes to 300 with 60 routes per hostname 168.27 101.7 124.31 265.07
scaling down httproutes to 100 with 20 routes per hostname 193.25 112 124.3 297.42
scaling down httproutes to 50 with 10 routes per hostname 241.57 113.14 104.67 327.97
scaling down httproutes to 10 with 2 routes per hostname 241.95 113.72 104.67 358.4

After:

Test Name Envoy Gateway Memory (MiB)
min/max/means
Envoy Gateway CPU (%)
min/max/means
Averaged Envoy Proxy Memory (MiB)
min/max/means
Averaged Envoy Proxy CPU (%)
min/max/means
scaling up httproutes to 10 with 2 routes per hostname 127.51 / 150.86 / 144.07 0.13 / 0.80 / 0.38 0.00 / 29.88 / 25.80 0.00 / 100.13 / 12.54
scaling up httproutes to 50 with 10 routes per hostname 148.75 / 153.82 / 152.34 0.33 / 4.60 / 0.87 29.79 / 33.95 / 33.10 0.00 / 99.85 / 7.71
scaling up httproutes to 100 with 20 routes per hostname 150.38 / 159.34 / 153.43 0.40 / 7.67 / 1.07 33.91 / 38.10 / 37.61 0.00 / 29.73 / 2.20
scaling up httproutes to 300 with 60 routes per hostname 161.60 / 175.77 / 172.40 0.47 / 48.30 / 4.31 44.01 / 56.37 / 55.59 0.00 / 52.28 / 1.80
scaling up httproutes to 500 with 100 routes per hostname 186.28 / 198.06 / 193.70 0.47 / 48.33 / 1.61 75.98 / 76.22 / 76.07 0.00 / 100.31 / 10.82
scaling up httproutes to 1000 with 200 routes per hostname 219.07 / 237.53 / 231.35 0.07 / 1.07 / 0.72 126.10 / 126.35 / 126.21 0.00 / 100.06 / 9.37
scaling down httproutes to 500 with 100 routes per hostname 189.78 / 195.39 / 193.02 0.13 / 52.81 / 1.75 126.16 / 126.58 / 126.25 0.00 / 100.23 / 13.52
scaling down httproutes to 300 with 60 routes per hostname 177.26 / 182.66 / 179.68 0.33 / 73.13 / 3.56 126.17 / 126.36 / 126.23 0.00 / 100.03 / 7.58
scaling down httproutes to 100 with 20 routes per hostname 161.34 / 174.34 / 164.15 0.73 / 67.20 / 8.78 107.81 / 126.35 / 109.25 0.00 / 82.12 / 10.22
scaling down httproutes to 50 with 10 routes per hostname 174.34 / 239.21 / 225.49 0.60 / 6.67 / 1.37 107.89 / 108.04 / 107.93 0.00 / 100.01 / 10.99
scaling down httproutes to 10 with 2 routes per hostname 228.64 / 239.40 / 231.82 0.67 / 3.47 / 1.06 107.89 / 108.03 / 107.93 0.00 / 99.82 / 5.50

shawnh2 added 4 commits March 24, 2025 20:08
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
@zirain zirain force-pushed the benchmark-acc-metrics branch from fe03770 to 36114e8 Compare March 24, 2025 12:08
arkodg
arkodg previously approved these changes Mar 25, 2025
@arkodg arkodg requested review from a team March 25, 2025 01:08
Co-authored-by: Arko Dasgupta <arkodg@users.noreply.github.com>
Signed-off-by: sh2 <shawnhxh@outlook.com>
@arkodg arkodg merged commit 14b506c into envoyproxy:main Mar 25, 2025
25 checks passed
@shawnh2 shawnh2 deleted the benchmark-acc-metrics branch March 25, 2025 03:59
lxie123 pushed a commit to lxie123/gateway that referenced this pull request Mar 26, 2025
* sample prom metrics and pprof profiles

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

* update env table data and promql

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

* enhance metrics value process

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

* optimze report

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

* update typo in suite.go

Co-authored-by: Arko Dasgupta <arkodg@users.noreply.github.com>
Signed-off-by: sh2 <shawnhxh@outlook.com>

---------

Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: sh2 <shawnhxh@outlook.com>
Co-authored-by: Arko Dasgupta <arkodg@users.noreply.github.com>
Signed-off-by: Ubuntu <lxie@k0rdent.dg30utm4mk2e3fblffod03krea.bx.internal.cloudapp.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants