stats: add perf-annotation regexes used in stats tag extraction#2615
stats: add perf-annotation regexes used in stats tag extraction#2615htuch merged 33 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…r into MainCommon. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Provide support to allow folks to use thread annotations across Envoy in the absence of a full solution to envoyproxy#2571. Signed-off-by: Harvey Tuch <htuch@google.com>
When using --runs_per_test with concurrency the individual invocations of the hot restart test may collide. Bazel provides an environment variable when using --runs_per_test that allows us to set --base_id in each parallel invocation to a guaranteed unique value so this doesn't happen. Signed-off-by: Dan Noé <dpn@google.com>
Envoy sets up SHM even when not starting up fully, so even cases where the binary is not started with a full config want --base-id. Otherwise we can get "file exists" when running many tests in parallel. Signed-off-by: Dan Noé <dpn@google.com>
|
Let me know when this is ready to review and I'll assign some folks. |
|
This entire PR blocked on #2623 as it depends on the existence of a working flow that initializes the server but exits without running anything. However I could pull out just the new perf_annotation class and review that.... |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
| void record(absl::string_view category, absl::string_view description); | ||
|
|
||
| private: | ||
| SystemTime start_time_; |
There was a problem hiding this comment.
Is there a reason we're not using monotonic clocks/times here and throughout since we're just computing durations?
| } | ||
| } | ||
|
|
||
| // TODO(jmarantz): Consider hooking up perf information-dump into admin console, if |
There was a problem hiding this comment.
+1 this would be a really cool future addition!
| public: | ||
| /** | ||
| * Records time consumed by a category and description, which are just | ||
| * joined together in the library with " / ". |
| } | ||
|
|
||
| PerfAnnotationContext* PerfAnnotationContext::getOrCreate() { | ||
| static PerfAnnotationContext* context = new PerfAnnotationContext(); |
There was a problem hiding this comment.
Is there any reason we can't just subclass the ThreadSafeSingleton to keep this singleton implementation consistent with others like it across Envoy?
| * perf_op.record("category", "description"); | ||
| * } | ||
| */ | ||
| class PerfOperation { |
There was a problem hiding this comment.
There don't appear to be any tests for this class below. Can we add a few? I think just using a second timer to ensure that the recorded time will be greater than some value will suffice.
|
Forgot to give an overall comment :). This is really cool work! It might be interesting to explore adding histograms to this once we start using a persistent histogram implementation for stats. |
This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes. This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373 Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled Produces tables in this format. Duration(us) # Calls Mean(ns) StdDev(ns) Min(ns) Max(ns) Category Description 4600 4 1150000 129099 1000000 1300000 alpha 1 200 1 200000 nan 200000 200000 gamma 2 87 3 29000 1000 28000 30000 beta 3 Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production. Risk Level: Low -- new utility library not used by anything yet. Release Notes: N/A Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ored dot token (#2630) This PR expands the scope of regexes where we can parse out a prefix. This has a dramatic effect on the annotated perf information available with #2615 patched in Before: Duration(us) # Calls per_call(ns) Category / Description 1209259 680147 1777 re-miss / envoy.grpc_bridge_method 1188780 680147 1747 re-miss / envoy.grpc_bridge_service 1188206 680147 1746 re-miss / cipher_suite 683323 680127 1004 re-miss / envoy.response_code_class 671907 680147 987 re-miss / envoy.response_code 639791 680147 940 re-miss / envoy.dynamo_partition_id 628738 680147 924 re-miss / envoy.dynamo_operation 623747 680147 917 re-miss / envoy.mongo_callsite 623687 680031 917 re-miss / envoy.http_conn_manager_prefix 620776 680147 912 re-miss / envoy.http_user_agent 619797 680147 911 re-miss / envoy.dyanmo_table 614521 680147 903 re-miss / envoy.mongo_collection 609383 680147 895 re-miss / envoy.mongo_cmd 606698 680147 892 re-miss / envoy.ssl_cipher 606616 680147 891 re-miss / envoy.fault_downstream_cluster 605605 680147 890 re-miss / envoy.virtual_cluster 445298 680000 654 re-match / envoy.cluster_name 67 116 577 re-match / envoy.http_conn_manager_prefix 20 20 1031 re-match / envoy.response_code_class 11 9 1237 re-miss / envoy.listener_address 4 5 886 re-match / envoy.listener_address After: Duration(us) # Calls per_call(ns) Category / Description 1200107 680000 1764 re-miss / envoy.grpc_bridge_method 1183043 680000 1739 re-miss / cipher_suite 1178935 680000 1733 re-miss / envoy.grpc_bridge_service 678943 680127 998 re-miss / envoy.response_code_class 668008 680147 982 re-miss / envoy.response_code 617447 680031 907 re-miss / envoy.http_conn_manager_prefix 446239 680000 656 re-match / envoy.cluster_name 186 106 1760 re-miss / envoy.dynamo_partition_id 177 106 1678 re-miss / envoy.dyanmo_table 171 106 1621 re-miss / envoy.dynamo_operation 166 106 1568 re-miss / envoy.http_user_agent 163 106 1543 re-miss / envoy.fault_downstream_cluster 68 116 591 re-match / envoy.http_conn_manager_prefix 25 14 1849 re-miss / envoy.ssl_cipher 20 20 1005 re-match / envoy.response_code_class 11 9 1232 re-miss / envoy.listener_address Note that many fewer regexes need to be evaluated, although the really expensive ones are still examined very often. They need to be evaluated less often or made to be cheaper, preferably both. Risk Level: Low Release Notes: N/A Signed-off-by: Joshua Marantz <jmarantz@google.com>
…looks better. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
OK this has now merged in #2626 and has only the application of it to the stats init problem. PTAL. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
| for (size_t i = 0; i < num_columns; ++i) { | ||
| formats.push_back(absl::StrCat("{:>", widths[i], "}")); | ||
| // left-justify category & description, but right-justify the numeric columns. | ||
| const char* justify = (i < num_columns - 2) ? ">" : "<"; |
There was a problem hiding this comment.
Nit: prefer const std::string over raw const char* where possible.
There was a problem hiding this comment.
Seems strictly slower but OK. WDYT of absl::string_view for this instead?
| /** | ||
| * Completely load and initialize the config, and then exit without running the listener loop. | ||
| */ | ||
| InitOnly, |
There was a problem hiding this comment.
Ah no, but if #2679 can get merged it'll be way easier to add one.
There was a problem hiding this comment.
Added one....i resurrected the random component to the main_common_test base ID calculation as it seems you can't immediately open one of the required resources immediate after closing it in the previous test method.
…ar*. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
This required re-incorporating the random number generator into the mode-id, as different test methods running in succession each need to be in separate namespaces for reasons I don't totally understand. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ke it easier to test main_common). Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
htuch
left a comment
There was a problem hiding this comment.
Thanks, I'm still not that excited about the PID weighted randomness, but I can't offer up a better solution so seems the best thing to do.
…on (envoyproxy#2615)" This reverts commit b6b1564.
Description::
This is step 3 in the plan to improve startup performance. This is another step toward addressing #2373
init_onlywhich is used to fully initialize the server but not start the listenerRisk Level: Low
Testing:
//test/...
bazel build --compilation_mode=opt //source/exe:envoy-static --define=perf_annotation=enabled
bazel-bin/source/exe/envoy-static --mode init_only -c 10k_clusters.json
Release Notes: not yet -- will issue them when the remainder of the changes in this series roll out