stats: add perf-annotation regexes used in stats tag extraction by jmarantz · Pull Request #2615 · envoyproxy/envoy

jmarantz · 2018-02-15T03:17:01Z

Description::
This is step 3 in the plan to improve startup performance. This is another step toward addressing #2373

Added new server mode init_only which is used to fully initialize the server but not start the listener
Tweaks slightly the perf output format as it looks better to left-justify the category and description.
Instrumented the regex parsing in stats tag extraction, to learn what slows us down. From the 10k example in CDS static load performance #2063:

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description                   
     1217329   680147      1789     1182.73      495   880932  re-miss   envoy.grpc_bridge_method      
     1205580   680147      1772     1227.24      486   924781  re-miss   cipher_suite                  
     1193745   680147      1755      1191.7      477   896321  re-miss   envoy.grpc_bridge_service     
      714594   680147      1050     1116.88      528   862691  re-miss   envoy.response_code           
      681802   680127      1002     1169.21      508   870749  re-miss   envoy.response_code_class     
      657579   680147       966     414.414      570   125234  re-miss   envoy.dynamo_partition_id     
      648601   680031       953     335.497      583    49374  re-miss   envoy.http_conn_manager_prefix
      647820   680147       952     345.378      565    39783  re-miss   envoy.dynamo_operation        
      646571   680147       950     347.486      486    22955  re-miss   envoy.ssl_cipher              
      645047   680147       948     336.928      504    30860  re-miss   envoy.mongo_callsite          
      640112   680147       941     600.817      535   399301  re-miss   envoy.dyanmo_table            
      639674   680147       940     344.334      521    56102  re-miss   envoy.http_user_agent         
      634744   680147       933     340.052      494    32314  re-miss   envoy.mongo_collection        
      630200   680147       926     1193.34      536   946328  re-miss   envoy.fault_downstream_cluster
      629252   680147       925      340.45      485    33107  re-miss   envoy.mongo_cmd               
      628669   680147       924     321.804      473    21084  re-miss   envoy.virtual_cluster         
      457053   680000       672     437.367      567   109544  re-match  envoy.cluster_name            
          82      116       710     284.499      462     1876  re-match  envoy.http_conn_manager_prefix
          21       20      1091     332.877      613     1646  re-match  envoy.response_code_class     
          17        9      1895     437.737     1524     2952  re-miss   envoy.listener_address        
           4        5       840     359.531      635     1481  re-match  envoy.listener_address

Risk Level: Low

Testing:
//test/...
bazel build --compilation_mode=opt //source/exe:envoy-static --define=perf_annotation=enabled
bazel-bin/source/exe/envoy-static --mode init_only -c 10k_clusters.json

Release Notes: not yet -- will issue them when the remainder of the changes in this series roll out

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…r into MainCommon. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Provide support to allow folks to use thread annotations across Envoy in the absence of a full solution to envoyproxy#2571. Signed-off-by: Harvey Tuch <htuch@google.com>

When using --runs_per_test with concurrency the individual invocations of the hot restart test may collide. Bazel provides an environment variable when using --runs_per_test that allows us to set --base_id in each parallel invocation to a guaranteed unique value so this doesn't happen. Signed-off-by: Dan Noé <dpn@google.com>

Envoy sets up SHM even when not starting up fully, so even cases where the binary is not started with a full config want --base-id. Otherwise we can get "file exists" when running many tests in parallel. Signed-off-by: Dan Noé <dpn@google.com>

dnoe · 2018-02-15T15:14:27Z

Let me know when this is ready to review and I'll assign some folks.

jmarantz · 2018-02-15T17:44:07Z

This entire PR blocked on #2623 as it depends on the existence of a working flow that initializes the server but exits without running anything.

However I could pull out just the new perf_annotation class and review that....

Signed-off-by: Joshua Marantz <jmarantz@google.com>

mrice32 · 2018-02-22T02:47:01Z

source/common/common/perf_annotation.h

+  void record(absl::string_view category, absl::string_view description);
+
+private:
+  SystemTime start_time_;


Is there a reason we're not using monotonic clocks/times here and throughout since we're just computing durations?

mrice32 · 2018-02-22T02:49:13Z

source/common/common/perf_annotation.cc

+  }
+}
+
+// TODO(jmarantz): Consider hooking up perf information-dump into admin console, if


+1 this would be a really cool future addition!

mrice32 · 2018-02-22T02:51:41Z

source/common/common/perf_annotation.h

+public:
+  /**
+   * Records time consumed by a category and description, which are just
+   * joined together in the library with " / ".


nit: add params to doxygen.

mrice32 · 2018-02-22T03:05:04Z

source/common/common/perf_annotation.cc

+}
+
+PerfAnnotationContext* PerfAnnotationContext::getOrCreate() {
+  static PerfAnnotationContext* context = new PerfAnnotationContext();


Is there any reason we can't just subclass the ThreadSafeSingleton to keep this singleton implementation consistent with others like it across Envoy?

mrice32 · 2018-02-22T03:10:05Z

source/common/common/perf_annotation.h

+ *   perf_op.record("category", "description");
+ * }
+ */
+class PerfOperation {


There don't appear to be any tests for this class below. Can we add a few? I think just using a second timer to ensure that the recorded time will be greater than some value will suffice.

mrice32 · 2018-02-22T03:15:25Z

Forgot to give an overall comment :). This is really cool work! It might be interesting to explore adding histograms to this once we start using a persistent histogram implementation for stats.

jmarantz · 2018-02-26T15:38:25Z

Closing this temporarily as the perf annotation library is in a #2626; once that's in I'll re-open this with the application to stats regexes. @mrice32 's comments will be applied to #2626.

This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes. This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373 Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled Produces tables in this format. Duration(us) # Calls Mean(ns) StdDev(ns) Min(ns) Max(ns) Category Description 4600 4 1150000 129099 1000000 1300000 alpha 1 200 1 200000 nan 200000 200000 gamma 2 87 3 29000 1000 28000 30000 beta 3 Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production. Risk Level: Low -- new utility library not used by anything yet. Release Notes: N/A Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…ored dot token (#2630) This PR expands the scope of regexes where we can parse out a prefix. This has a dramatic effect on the annotated perf information available with #2615 patched in Before: Duration(us) # Calls per_call(ns) Category / Description 1209259 680147 1777 re-miss / envoy.grpc_bridge_method 1188780 680147 1747 re-miss / envoy.grpc_bridge_service 1188206 680147 1746 re-miss / cipher_suite 683323 680127 1004 re-miss / envoy.response_code_class 671907 680147 987 re-miss / envoy.response_code 639791 680147 940 re-miss / envoy.dynamo_partition_id 628738 680147 924 re-miss / envoy.dynamo_operation 623747 680147 917 re-miss / envoy.mongo_callsite 623687 680031 917 re-miss / envoy.http_conn_manager_prefix 620776 680147 912 re-miss / envoy.http_user_agent 619797 680147 911 re-miss / envoy.dyanmo_table 614521 680147 903 re-miss / envoy.mongo_collection 609383 680147 895 re-miss / envoy.mongo_cmd 606698 680147 892 re-miss / envoy.ssl_cipher 606616 680147 891 re-miss / envoy.fault_downstream_cluster 605605 680147 890 re-miss / envoy.virtual_cluster 445298 680000 654 re-match / envoy.cluster_name 67 116 577 re-match / envoy.http_conn_manager_prefix 20 20 1031 re-match / envoy.response_code_class 11 9 1237 re-miss / envoy.listener_address 4 5 886 re-match / envoy.listener_address After: Duration(us) # Calls per_call(ns) Category / Description 1200107 680000 1764 re-miss / envoy.grpc_bridge_method 1183043 680000 1739 re-miss / cipher_suite 1178935 680000 1733 re-miss / envoy.grpc_bridge_service 678943 680127 998 re-miss / envoy.response_code_class 668008 680147 982 re-miss / envoy.response_code 617447 680031 907 re-miss / envoy.http_conn_manager_prefix 446239 680000 656 re-match / envoy.cluster_name 186 106 1760 re-miss / envoy.dynamo_partition_id 177 106 1678 re-miss / envoy.dyanmo_table 171 106 1621 re-miss / envoy.dynamo_operation 166 106 1568 re-miss / envoy.http_user_agent 163 106 1543 re-miss / envoy.fault_downstream_cluster 68 116 591 re-match / envoy.http_conn_manager_prefix 25 14 1849 re-miss / envoy.ssl_cipher 20 20 1005 re-match / envoy.response_code_class 11 9 1232 re-miss / envoy.listener_address Note that many fewer regexes need to be evaluated, although the really expensive ones are still examined very often. They need to be evaluated less often or made to be cheaper, preferably both. Risk Level: Low Release Notes: N/A Signed-off-by: Joshua Marantz <jmarantz@google.com>

…looks better. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-02-28T21:04:11Z

OK this has now merged in #2626 and has only the application of it to the stats init problem. PTAL.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch · 2018-03-02T21:22:48Z

source/common/common/perf_annotation.cc

  for (size_t i = 0; i < num_columns; ++i) {
-    formats.push_back(absl::StrCat("{:>", widths[i], "}"));
+    // left-justify category & description, but right-justify the numeric columns.
+    const char* justify = (i < num_columns - 2) ? ">" : "<";


Nit: prefer const std::string over raw const char* where possible.

Seems strictly slower but OK. WDYT of absl::string_view for this instead?

htuch · 2018-03-02T21:23:51Z

include/envoy/server/options.h

+  /**
+   * Completely load and initialize the config, and then exit without running the listener loop.
+   */
+  InitOnly,


Is there a test for this?

Ah no, but if #2679 can get merged it'll be way easier to add one.

Added one....i resurrected the random component to the main_common_test base ID calculation as it seems you can't immediately open one of the required resources immediate after closing it in the previous test method.

…ar*. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

This required re-incorporating the random number generator into the mode-id, as different test methods running in succession each need to be in separate namespaces for reasons I don't totally understand. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…ke it easier to test main_common). Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch

Thanks, I'm still not that excited about the PID weighted randomness, but I can't offer up a better solution so seems the best thing to do.

…on (envoyproxy#2615)" This reverts commit b6b1564.

jmarantz added 11 commits February 13, 2018 18:07

Checkout PerfAnnotation class (not ready for review).

aaaecea

Signed-off-by: Joshua Marantz <jmarantz@google.com>

checkpoint with envoy_init_speed_test building.

cacf76b

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Checkpoint bazel option enabled to enable perf annotation.

b21eabd

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Checkpoint with hacky --disable_hot_restart flag.

2263657

Signed-off-by: Joshua Marantz <jmarantz@google.com>

All basic functionality implemented, including one more minor refacto…

845c81c

…r into MainCommon. Signed-off-by: Joshua Marantz <jmarantz@google.com>

cleanup a little

65f58c8

Signed-off-by: Joshua Marantz <jmarantz@google.com>

format fix-up.

5a047b2

Signed-off-by: Joshua Marantz <jmarantz@google.com>

test-coverage for dump()

3750841

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Rather than making a new binary, add a new server mode.

b9366b9

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'master' into perf-annotation

7a8b42a

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Remove last remnants of the envoy_init_speed_test. formatting.

479fa81

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz changed the title ~~utility: perf annotation tools and a more detailed regex analysis~~ WIP: utility: perf annotation tools and a more detailed regex analysis Feb 15, 2018

htuch and others added 3 commits February 15, 2018 00:11

thread: annotation support when !OS X.

72d18aa

Provide support to allow folks to use thread annotations across Envoy in the absence of a full solution to envoyproxy#2571. Signed-off-by: Harvey Tuch <htuch@google.com>

jmarantz added 2 commits February 15, 2018 15:29

Add perf_annotation_lib.

2db4b24

Signed-off-by: Joshua Marantz <jmarantz@google.com>

formatting

d440e99

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Feb 15, 2018

utility: Add performance annotation library #2626

Merged

jmarantz added 5 commits February 15, 2018 16:41

Merge branch 'master' into perf-annotation-lib

d69399b

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'perf-annotation-lib' into perf-annotation

89fa9dc

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Use the cleaned up version of perf_annotation from envoyproxy#2626

c85f07e

Signed-off-by: Joshua Marantz <jmarantz@google.com>

fix crash and tweak output format

759418c

Signed-off-by: Joshua Marantz <jmarantz@google.com>

fix test for left-justfied text column.

869b490

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Feb 16, 2018

stats: find regex prefixes more aggressively, taking into account ignored dot token #2630

Merged

jmarantz changed the title ~~WIP: utility: perf annotation tools and a more detailed regex analysis~~ utility: perf annotation tools and a more detailed regex analysis Feb 16, 2018

dnoe assigned dnoe, mrice32 and alyssawilk Feb 16, 2018

mrice32 reviewed Feb 22, 2018

View reviewed changes

jmarantz closed this Feb 26, 2018

jmarantz added 2 commits February 28, 2018 15:23

Merge branch 'master' into perf-annotation

f219f2b

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Resolve some remaining conflicts from the merge.

99001c1

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Tweak perf annotation format slightly as left-justifying text-fields …

2c7ab78

…looks better. Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz reopened this Feb 28, 2018

Merge branch 'master' into perf-annotation

15c0386

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz added 2 commits February 28, 2018 16:27

Remove more merge-conflict detritus, which was in a comment.

8284adf

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'master' into perf-annotation

9d6f4c4

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Mar 1, 2018

stats: Guard regex lookups with substring searches where possible #2693

Merged

htuch assigned htuch and unassigned alyssawilk Mar 2, 2018

htuch suggested changes Mar 2, 2018

View reviewed changes

jmarantz added 6 commits March 2, 2018 16:33

Use absl::string_view for a temp string constant rather than const ch…

0acf719

…ar*. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge branch 'master' into perf-annotation

c2fc345

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Clean up comments about why the random-number generator is still needed.

7b06b49

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add explicit test for --mode init_only (in addition to using it to ma…

090ae2e

…ke it easier to test main_common). Signed-off-by: Joshua Marantz <jmarantz@google.com>

Correct a stale comment.

479aebf

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz changed the title ~~utility: perf annotation tools and a more detailed regex analysis~~ stats: add perf-annotation regexes used in stats tag extraction Mar 3, 2018

htuch approved these changes Mar 5, 2018

View reviewed changes

htuch merged commit b6b1564 into envoyproxy:master Mar 5, 2018

jmarantz deleted the perf-annotation branch March 5, 2018 13:13

lita pushed a commit to lita/envoy that referenced this pull request Mar 5, 2018

Revert "stats: add perf-annotation regexes used in stats tag extracti…

7b0bd78

…on (envoyproxy#2615)" This reverts commit b6b1564.

Conversation

jmarantz commented Feb 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnoe commented Feb 15, 2018

Uh oh!

jmarantz commented Feb 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrice32 commented Feb 22, 2018

Uh oh!

jmarantz commented Feb 26, 2018

Uh oh!

jmarantz commented Feb 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jmarantz commented Feb 15, 2018 •

edited

Loading