Skip to content

stats: add perf-annotation regexes used in stats tag extraction#2615

Merged
htuch merged 33 commits intoenvoyproxy:masterfrom
jmarantz:perf-annotation
Mar 5, 2018
Merged

stats: add perf-annotation regexes used in stats tag extraction#2615
htuch merged 33 commits intoenvoyproxy:masterfrom
jmarantz:perf-annotation

Conversation

@jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Feb 15, 2018

Description::
This is step 3 in the plan to improve startup performance. This is another step toward addressing #2373

  • Added new server mode init_only which is used to fully initialize the server but not start the listener
  • Tweaks slightly the perf output format as it looks better to left-justify the category and description.
  • Instrumented the regex parsing in stats tag extraction, to learn what slows us down. From the 10k example in CDS static load performance #2063:
Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description                   
     1217329   680147      1789     1182.73      495   880932  re-miss   envoy.grpc_bridge_method      
     1205580   680147      1772     1227.24      486   924781  re-miss   cipher_suite                  
     1193745   680147      1755      1191.7      477   896321  re-miss   envoy.grpc_bridge_service     
      714594   680147      1050     1116.88      528   862691  re-miss   envoy.response_code           
      681802   680127      1002     1169.21      508   870749  re-miss   envoy.response_code_class     
      657579   680147       966     414.414      570   125234  re-miss   envoy.dynamo_partition_id     
      648601   680031       953     335.497      583    49374  re-miss   envoy.http_conn_manager_prefix
      647820   680147       952     345.378      565    39783  re-miss   envoy.dynamo_operation        
      646571   680147       950     347.486      486    22955  re-miss   envoy.ssl_cipher              
      645047   680147       948     336.928      504    30860  re-miss   envoy.mongo_callsite          
      640112   680147       941     600.817      535   399301  re-miss   envoy.dyanmo_table            
      639674   680147       940     344.334      521    56102  re-miss   envoy.http_user_agent         
      634744   680147       933     340.052      494    32314  re-miss   envoy.mongo_collection        
      630200   680147       926     1193.34      536   946328  re-miss   envoy.fault_downstream_cluster
      629252   680147       925      340.45      485    33107  re-miss   envoy.mongo_cmd               
      628669   680147       924     321.804      473    21084  re-miss   envoy.virtual_cluster         
      457053   680000       672     437.367      567   109544  re-match  envoy.cluster_name            
          82      116       710     284.499      462     1876  re-match  envoy.http_conn_manager_prefix
          21       20      1091     332.877      613     1646  re-match  envoy.response_code_class     
          17        9      1895     437.737     1524     2952  re-miss   envoy.listener_address        
           4        5       840     359.531      635     1481  re-match  envoy.listener_address        

Risk Level: Low

Testing:
//test/...
bazel build --compilation_mode=opt //source/exe:envoy-static --define=perf_annotation=enabled
bazel-bin/source/exe/envoy-static --mode init_only -c 10k_clusters.json

Release Notes: not yet -- will issue them when the remainder of the changes in this series roll out

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…r into MainCommon.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title utility: perf annotation tools and a more detailed regex analysis WIP: utility: perf annotation tools and a more detailed regex analysis Feb 15, 2018
htuch and others added 3 commits February 15, 2018 00:11
Provide support to allow folks to use thread annotations across Envoy in
the absence of a full solution to envoyproxy#2571.

Signed-off-by: Harvey Tuch <htuch@google.com>
When using --runs_per_test with concurrency the individual invocations
of the hot restart test may collide. Bazel provides an environment
variable when using --runs_per_test that allows us to set --base_id in
each parallel invocation to a guaranteed unique value so this doesn't
happen.

Signed-off-by: Dan Noé <dpn@google.com>
Envoy sets up SHM even when not starting up fully, so even cases where
the binary is not started with a full config want --base-id. Otherwise
we can get "file exists" when running many tests in parallel.

Signed-off-by: Dan Noé <dpn@google.com>
@dnoe
Copy link
Contributor

dnoe commented Feb 15, 2018

Let me know when this is ready to review and I'll assign some folks.

@jmarantz
Copy link
Contributor Author

This entire PR blocked on #2623 as it depends on the existence of a working flow that initializes the server but exits without running anything.

However I could pull out just the new perf_annotation class and review that....

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title WIP: utility: perf annotation tools and a more detailed regex analysis utility: perf annotation tools and a more detailed regex analysis Feb 16, 2018
void record(absl::string_view category, absl::string_view description);

private:
SystemTime start_time_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're not using monotonic clocks/times here and throughout since we're just computing durations?

}
}

// TODO(jmarantz): Consider hooking up perf information-dump into admin console, if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this would be a really cool future addition!

public:
/**
* Records time consumed by a category and description, which are just
* joined together in the library with " / ".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add params to doxygen.

}

PerfAnnotationContext* PerfAnnotationContext::getOrCreate() {
static PerfAnnotationContext* context = new PerfAnnotationContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we can't just subclass the ThreadSafeSingleton to keep this singleton implementation consistent with others like it across Envoy?

* perf_op.record("category", "description");
* }
*/
class PerfOperation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There don't appear to be any tests for this class below. Can we add a few? I think just using a second timer to ensure that the recorded time will be greater than some value will suffice.

@mrice32
Copy link
Member

mrice32 commented Feb 22, 2018

Forgot to give an overall comment :). This is really cool work! It might be interesting to explore adding histograms to this once we start using a persistent histogram implementation for stats.

@jmarantz
Copy link
Contributor Author

Closing this temporarily as the perf annotation library is in a #2626; once that's in I'll re-open this with the application to stats regexes. @mrice32 's comments will be applied to #2626.

@jmarantz jmarantz closed this Feb 26, 2018
htuch pushed a commit that referenced this pull request Feb 28, 2018
This is broken out from #2615 as it likely deserves its own review. This provides a mechanism to annotate and measure the costs of functions that are data-dependent, e.g. regexes.

This is step 3a-ish, in the plan to improve startup performance. This is another step toward addressing #2373

Adds perf annotation library that can be used to instrument code but disappear completely from generated code unless enabled with bazel --define=perf_annotation=enabled

Produces tables in this format.

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description
        4600        4   1150000      129099  1000000  1300000     alpha            1
         200        1    200000         nan   200000   200000     gamma            2
          87        3     29000        1000    28000    30000      beta            3
Instrumentation can be coded into the system but is turned off via compiler macros so there is zero cost in production.

Risk Level: Low -- new utility library not used by anything yet.

Release Notes: N/A

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
htuch pushed a commit that referenced this pull request Feb 28, 2018
…ored dot token (#2630)

This PR expands the scope of regexes where we can parse out a prefix. This has a dramatic effect on the annotated perf information available with #2615 patched in

Before:

Duration(us)  # Calls  per_call(ns)  Category / Description
     1209259   680147          1777  re-miss / envoy.grpc_bridge_method
     1188780   680147          1747  re-miss / envoy.grpc_bridge_service
     1188206   680147          1746  re-miss / cipher_suite
      683323   680127          1004  re-miss / envoy.response_code_class
      671907   680147           987  re-miss / envoy.response_code
      639791   680147           940  re-miss / envoy.dynamo_partition_id
      628738   680147           924  re-miss / envoy.dynamo_operation
      623747   680147           917  re-miss / envoy.mongo_callsite
      623687   680031           917  re-miss / envoy.http_conn_manager_prefix
      620776   680147           912  re-miss / envoy.http_user_agent
      619797   680147           911  re-miss / envoy.dyanmo_table
      614521   680147           903  re-miss / envoy.mongo_collection
      609383   680147           895  re-miss / envoy.mongo_cmd
      606698   680147           892  re-miss / envoy.ssl_cipher
      606616   680147           891  re-miss / envoy.fault_downstream_cluster
      605605   680147           890  re-miss / envoy.virtual_cluster
      445298   680000           654  re-match / envoy.cluster_name
          67      116           577  re-match / envoy.http_conn_manager_prefix
          20       20          1031  re-match / envoy.response_code_class
          11        9          1237  re-miss / envoy.listener_address
           4        5           886  re-match / envoy.listener_address
After:

Duration(us)  # Calls  per_call(ns)  Category / Description
     1200107   680000          1764  re-miss / envoy.grpc_bridge_method
     1183043   680000          1739  re-miss / cipher_suite
     1178935   680000          1733  re-miss / envoy.grpc_bridge_service
      678943   680127           998  re-miss / envoy.response_code_class
      668008   680147           982  re-miss / envoy.response_code
      617447   680031           907  re-miss / envoy.http_conn_manager_prefix
      446239   680000           656  re-match / envoy.cluster_name
         186      106          1760  re-miss / envoy.dynamo_partition_id
         177      106          1678  re-miss / envoy.dyanmo_table
         171      106          1621  re-miss / envoy.dynamo_operation
         166      106          1568  re-miss / envoy.http_user_agent
         163      106          1543  re-miss / envoy.fault_downstream_cluster
          68      116           591  re-match / envoy.http_conn_manager_prefix
          25       14          1849  re-miss / envoy.ssl_cipher
          20       20          1005  re-match / envoy.response_code_class
          11        9          1232  re-miss / envoy.listener_address
Note that many fewer regexes need to be evaluated, although the really expensive ones are still examined very often. They need to be evaluated less often or made to be cheaper, preferably both.

Risk Level: Low

Release Notes: N/A

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…looks better.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz reopened this Feb 28, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

OK this has now merged in #2626 and has only the application of it to the stats init problem. PTAL.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
for (size_t i = 0; i < num_columns; ++i) {
formats.push_back(absl::StrCat("{:>", widths[i], "}"));
// left-justify category & description, but right-justify the numeric columns.
const char* justify = (i < num_columns - 2) ? ">" : "<";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefer const std::string over raw const char* where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strictly slower but OK. WDYT of absl::string_view for this instead?

/**
* Completely load and initialize the config, and then exit without running the listener loop.
*/
InitOnly,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, but if #2679 can get merged it'll be way easier to add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one....i resurrected the random component to the main_common_test base ID calculation as it seems you can't immediately open one of the required resources immediate after closing it in the previous test method.

jmarantz added 6 commits March 2, 2018 16:33
…ar*.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
This required re-incorporating the random number generator into the mode-id, as
different test methods running in succession each need to be in separate namespaces
for reasons I don't totally understand.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ke it easier to test main_common).

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title utility: perf annotation tools and a more detailed regex analysis stats: add perf-annotation regexes used in stats tag extraction Mar 3, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm still not that excited about the PID weighted randomness, but I can't offer up a better solution so seems the best thing to do.

@htuch htuch merged commit b6b1564 into envoyproxy:master Mar 5, 2018
@jmarantz jmarantz deleted the perf-annotation branch March 5, 2018 13:13
lita pushed a commit to lita/envoy that referenced this pull request Mar 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants