Skip to content

stats: Guard regex lookups with substring searches where possible#2693

Merged
htuch merged 9 commits intoenvoyproxy:masterfrom
jmarantz:regex-specify-substr
Mar 9, 2018
Merged

stats: Guard regex lookups with substring searches where possible#2693
htuch merged 9 commits intoenvoyproxy:masterfrom
jmarantz:regex-specify-substr

Conversation

@jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Mar 1, 2018

Annotate addRegex calls with expected substrings. It is significantly faster to search a candidate stat for these substrings prior to running regex analysis. This change speeds up 10k case from 8 seconds to 3.5 sec.

Before:

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category  Description                   
     1198153   680000      1761      1408.5     1126  1052984  re-miss   envoy.grpc_bridge_method      
     1184283   680000      1741     1409.55     1119  1048041  re-miss   cipher_suite                  
     1179557   680000      1734     579.372     1112   121057  re-miss   envoy.grpc_bridge_service     
      686269   680127      1009     562.467      483    75670  re-miss   envoy.response_code_class     
      665593   680147       978     411.618      490    79632  re-miss   envoy.response_code           
      616527   680031       906     418.034      517    85152  re-miss   envoy.http_conn_manager_prefix
      451858   680000       664     1333.08      583  1034655  re-match  envoy.cluster_name            
         226      106      2135     999.719      971     9424  re-miss   envoy.dynamo_partition_id     
         215      106      2036     1121.43      915    11952  re-miss   envoy.http_user_agent         
         212      106      2003     580.828      958     3485  re-miss   envoy.dynamo_operation        
         205      106      1935     551.338      928     3387  re-miss   envoy.dyanmo_table            
         202      106      1906     546.486      923     3352  re-miss   envoy.fault_downstream_cluster
          83      116       723     324.772      466     2208  re-match  envoy.http_conn_manager_prefix
          33       14      2373     282.729     2020     2852  re-miss   envoy.ssl_cipher              
          21       20      1056     260.687      651     1439  re-match  envoy.response_code_class     
          16        9      1784     342.155     1515     2663  re-miss   envoy.listener_address        
           4        5       845     347.121      657     1464  re-match  envoy.listener_address        

After:

Duration(us)  # Calls  Mean(ns)  StdDev(ns)  Min(ns)  Max(ns)  Category        Description                   
      457801   680000       673     859.183      561   198777  re-match        envoy.cluster_name            
      158922   150042      1059     1317.31      707   193774  re-miss         envoy.response_code           
      158747   150022      1058      1240.3      732   202819  re-miss         envoy.response_code_class     
       44221   530105        83     1593.54       33  1140472  re-skip-substr  envoy.response_code           
       42209   680000        62     323.569       41   180224  re-skip-substr  cipher_suite                  
       37968   680000        55     278.834       35   152175  re-skip-substr  envoy.grpc_bridge_method      
       37254   680000        54     346.805       35   172786  re-skip-substr  envoy.grpc_bridge_service     
       35105   530105        66     294.315       30   120865  re-skip-substr  envoy.response_code_class     
          68      116       591     426.393      417     2952  re-match        envoy.http_conn_manager_prefix
          39       14      2803     2468.54     1517    10385  re-miss         envoy.ssl_cipher              
          17       20       890     198.915      622     1182  re-match        envoy.response_code_class     
          10        9      1215     236.915     1005     1785  re-miss         envoy.listener_address        
           6      106        65     22.9534       41      234  re-skip-substr  envoy.dynamo_partition_id     
           6      106        65      14.675       47      143  re-skip-substr  envoy.dynamo_operation        
           6      106        62     13.1069       42      154  re-skip-substr  envoy.fault_downstream_cluster
           6      106        61      12.603       43      124  re-skip-substr  envoy.http_user_agent         
           5      106        49     11.6811       32      130  re-skip-substr  envoy.dyanmo_table            
           4        5       824     309.373      661     1376  re-match        envoy.listener_address        
           0        4        73     21.8689       53      103  re-skip-substr  envoy.http_conn_manager_prefix

Description:
Allows explicit specification of a required substring, which can be quickly scanned for in an input string before applying regexes.

Risk Level: Medium - mis-specifying these can lead to broken tag processing

Testing:
//test/...

Release Notes: N/A

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title stats: Guard regex looks with substring searches where possible WIP: stats: Guard regex looks with substring searches where possible Mar 1, 2018
jmarantz added 2 commits March 5, 2018 08:36
… regex-specify-substr

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Followed instructions in
https://stackoverflow.com/questions/1657017/how-to-squash-all-git-commits-into-one
to try to resolve DCO issue due to
envoyproxy@1fac332

as there are no comments in the PR yet I think this is OK.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz force-pushed the regex-specify-substr branch from 93a86b0 to fd2b507 Compare March 5, 2018 14:30
@jmarantz
Copy link
Contributor Author

jmarantz commented Mar 5, 2018

ready for initial pass, ptal. Maybe @mrice32 ?

@jmarantz jmarantz changed the title WIP: stats: Guard regex looks with substring searches where possible stats: Guard regex looks with substring searches where possible Mar 5, 2018
Copy link
Member

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks! Just a few small comments.


// cluster.[<cluster_name>.]ssl.ciphers.(<cipher>)
addRegex(SSL_CIPHER_SUITE, "^cluster(?=\\.).*?\\.ssl\\.ciphers(\\.(.*?))$");
addRegex(SSL_CIPHER_SUITE, "^cluster(?=\\.).*?\\.ssl\\.ciphers(\\.(.*?))$", "ssl.ciphers");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not ".ssl.ciphers." like in the other examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one was an omission, because I had some trouble with ".http." below and I wound up backing off the inclusion of the "." in some places where it was OK.

However I managed to sort out my issues with ".http." by exploiting the fact that I can now have two separate regexes for the same tag.


// cluster.[<route_target_cluster>.]grpc.(<grpc_service>.)*
addRegex(GRPC_BRIDGE_SERVICE, "^cluster(?=\\.).*?\\.grpc\\.((.*?)\\.)");
addRegex(GRPC_BRIDGE_SERVICE, "^cluster(?=\\.).*?\\.grpc\\.((.*?)\\.)", "grpc");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, why not ".grpc."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


PERF_OPERATION(perf);

if (!substr_.empty() && stat_name.find(substr_) == std::string::npos) {
Copy link
Member

@mrice32 mrice32 Mar 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a simple test case to isolate this new portion of the logic and ensure it works as expected. The code is already covered by the regex test cases, but adding a separate test will isolate the functionality from the tests that check the regexes, themselves. See tests in https://github.com/envoyproxy/envoy/blob/master/test/integration/stats_integration_test.cc and https://github.com/envoyproxy/envoy/blob/master/test/common/stats/stats_impl_test.cc#L88 for other tests that test specific parts of the extraction process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Broke out substrMismatch as a separate method to make it easier to test.

jmarantz added 2 commits March 6, 2018 17:59
…ecise substring pruning.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…gal & useful now.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title stats: Guard regex looks with substring searches where possible stats: Guard regex lookups with substring searches where possible Mar 6, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Contributor Author

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By breaking out the HTTP_CONN_MANAGER tag into two regexes I got the init speed to 2.8 seconds.

There is more on the table here for further PRs.


// cluster.[<cluster_name>.]ssl.ciphers.(<cipher>)
addRegex(SSL_CIPHER_SUITE, "^cluster(?=\\.).*?\\.ssl\\.ciphers(\\.(.*?))$");
addRegex(SSL_CIPHER_SUITE, "^cluster(?=\\.).*?\\.ssl\\.ciphers(\\.(.*?))$", "ssl.ciphers");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one was an omission, because I had some trouble with ".http." below and I wound up backing off the inclusion of the "." in some places where it was OK.

However I managed to sort out my issues with ".http." by exploiting the fact that I can now have two separate regexes for the same tag.


// cluster.[<route_target_cluster>.]grpc.(<grpc_service>.)*
addRegex(GRPC_BRIDGE_SERVICE, "^cluster(?=\\.).*?\\.grpc\\.((.*?)\\.)");
addRegex(GRPC_BRIDGE_SERVICE, "^cluster(?=\\.).*?\\.grpc\\.((.*?)\\.)", "grpc");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


PERF_OPERATION(perf);

if (!substr_.empty() && stat_name.find(substr_) == std::string::npos) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Broke out substrMismatch as a separate method to make it easier to test.

…p' anchor to a different regex.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
* @return TagExtractorPtr newly constructed TagExtractor.
*/
static TagExtractorPtr createTagExtractor(const std::string& name, const std::string& regex);
static TagExtractorPtr createTagExtractor(const std::string& name, const std::string& regex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add docs here and elsewhere that explain the prefix and what it does.

Copy link
Member

@mrice32 mrice32 Mar 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once that's done, LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; thanks for the review, and good catch!

Signed-off-by: Joshua Marantz <jmarantz@google.com>
htuch
htuch previously approved these changes Mar 8, 2018
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat performance win. Ready to ship once you can fix the small typo.

/**
* @param stat_name The stat name
* @return bool indicates whether tag extraction should be skipped for this stat_name due
* to a subdstring mismatch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/subdstring/substring/

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rad.

@htuch htuch merged commit b2610c8 into envoyproxy:master Mar 9, 2018
@jmarantz jmarantz deleted the regex-specify-substr branch March 9, 2018 04:26
jpsim added a commit that referenced this pull request Nov 28, 2022
Removing Admin from release builds by default

Risk Level: medium
Testing: n/a
Docs Changes: n/a
Release Notes: inline

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: alyssawilk <alyssar@google.com>
Co-authored-by: JP Simard <jp@jpsim.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim added a commit that referenced this pull request Nov 29, 2022
Removing Admin from release builds by default

Risk Level: medium
Testing: n/a
Docs Changes: n/a
Release Notes: inline

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: alyssawilk <alyssar@google.com>
Co-authored-by: JP Simard <jp@jpsim.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants