Skip to content

cryptomb: add queue size statistics#19180

Merged
ggreenway merged 15 commits intoenvoyproxy:mainfrom
VillePihlava:cryptomb-stats
Jan 4, 2022
Merged

cryptomb: add queue size statistics#19180
ggreenway merged 15 commits intoenvoyproxy:mainfrom
VillePihlava:cryptomb-stats

Conversation

@VillePihlava
Copy link
Contributor

@VillePihlava VillePihlava commented Dec 3, 2021

Commit Message:

Add queue size statistics to CryptoMb private key provider. The queue size statistics show the amount of simultaneous crypto operations handled at the time of a SIMD (single instruction, multiple data) operation. The statistics consist of a histogram that indicates how often different queue sizes have been processed.

Additional Description:

CryptoMb performance depends on the amount of simultaneous crypto operations processed. Statistics show how often different queue sizes are processed. The bigger the queue size, the better the throughput. The statistics help with debugging performance issues, finding good poll delay values, and show the distribution of different queue sizes.

Risk Level: Low
Testing: Unit testing, manual testing
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: CryptoMb private key provider requires Intel 3rd generation Xeon Scalable server processor for the AVX-512 IFMA instruction set.

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
@VillePihlava VillePihlava requested a review from rojkov as a code owner December 3, 2021 07:54
@repokitteh-read-only
Copy link

Hi @VillePihlava, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #19180 was opened by VillePihlava.

see: more, trace.

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make more sense as a histogram; you're basically re-implementing a histogram in this code. cc @jmarantz

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great generally, with one concern/question about how frequently we are going to be doing stats symbol-table lookups, and a few style/readability nits.

: us_(std::chrono::duration_cast<std::chrono::microseconds>(poll_delay)), type_(type),
key_size_(keysize), ipp_(ipp),
timer_(d.createTimer([this]() -> void { processRequests(); })) {
key_size_(keysize), ipp_(ipp), timer_(d.createTimer([this]() -> void { processRequests(); })),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the -> void

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed -> void from two places.

void CryptoMbQueue::processRequests() {
if (type_ == KeyType::Rsa) {
// Increment correct queue size statistic.
stats_.getQueueSizeCounters()[request_queue_.size() - 1].get().inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/getQueueSizeCounters/queueSizeCounters/ per convention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed after changing counters to histogram.

tls_(ThreadLocal::TypedSlot<ThreadLocalData>::makeUnique(factory_context.threadLocal())) {
tls_(ThreadLocal::TypedSlot<ThreadLocalData>::makeUnique(factory_context.threadLocal())),
stats_(factory_context.scope(), CryptoMbQueue::MULTIBUFF_BATCH, "cryptomb",
"rsa_queue_size_") {
Copy link
Contributor

@jmarantz jmarantz Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this does not get called in response to requests, right? this happens at config-time?

If this does happen in response to requests then we should pre-symbolize the stat names and I can help you solve that without taking symbol-table locks. If this is called on config it's ok. But given it's not obvious how often this gets created, we should either comment on that assumption, or we should just go ahead and pre-symbolize all the names during config or (better yet) during startup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens at config-time, I added a comment about it.

@@ -382,8 +407,11 @@ TEST(CryptoMbProviderTest, TestRSATimer) {
const BIGNUM *e, *n, *d;
RSA_get0_key(rsa, &n, &e, &d);
fakeIpp->setRsaKey(n, e, d);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pre-existing but the above 2 lines disturb me. They occur here and 4x in the prod code that I can see in this PR.

Can we have a helper API FakeIppCryptoImpl::setRsaKey(rsa); which captures the calls to RSA_get0_key and this->setRsaKey but initializes the 3 pointers to nullptr, and changes the 4 call-sites I can see in this PR and any others I can't see? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored tests in ops_test.cc and changed this.

CryptoMbStats::CryptoMbStats(Stats::Scope& scope, uint32_t max_queue_size,
absl::string_view stats_prefix,
absl::string_view queue_size_stat_prefix)
: stat_name_pool_(scope.symbolTable()), stats_prefix_(stat_name_pool_.add(stats_prefix)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats_prefix_ is not referenced after the constructor so you can drop the member variable and make this a local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not exist anymore after the histogram change.

stats);

size_t in_len = 32;
uint8_t in[32] = {0x7f};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a lot of repeated blocks of code in this test file. WDYT of factoring some of those out into methods/members of a test class for CryptoMbProviderTest, and shortening all these tests to hopefully make them more readable?

Then we can document some of the code that's mysterious like what this 0x7f is supposed to represent, or why in is 32 bytes long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored tests in ops_test.cc and added comments.

@jmarantz
Copy link
Contributor

jmarantz commented Dec 5, 2021

/wait

@jmarantz
Copy link
Contributor

jmarantz commented Dec 5, 2021

RE @ggreenway 's comment about histograms: I agree that would be a lot better, and that changes a few of my suggestions also. One thing to note is that the XDR histograms auto-decide buckets and I'm not sure if you need something more precise for this. I'm guessing auto-bucketed histograms would be fine.

@VillePihlava
Copy link
Contributor Author

Thanks for the comments!

@jmarantz @ggreenway I tried a histogram for this at one point, but the mentioned auto-decided buckets were confusing. Keeping each queue size as a separate counter shows the exact kind of queue sizes that come up. For example, the information on how frequently the biggest queue size comes up can be especially useful. However, I do agree that the histogram would be a better solution if it worked this way.

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
@ggreenway
Copy link
Member

There are two things happening with histograms: the internal storage auto-decides buckets (but is probably quite accurate). And the display from /stats displays a configured (not auto-determined) set of buckets. There is a default set of buckets configured, but that can be specified in the bootstrap config under stats_config. Also, some stat sinks (such as statsd) don't use either of those mechanisms, and send the raw data.

Can you clarify what the data looked like when you tried with a histogram? It may have just been that you need to configure the displayed buckets to get the results you need.

/wait-any

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
@VillePihlava
Copy link
Contributor Author

@ggreenway Thank you for the explanation! I changed the counters back to a histogram and the statsd solution worked as you explained.

My initial confusion came from the way the histogram stats are displayed from the /stats endpoint, something like this:

listener.0.0.0.0_9000.cryptomb.rsa_queue_sizes: P0(nan,1.0) P25(nan,8.024987213802275) P50(nan,8.049991475868183) P75(nan,8.074995737934092) P90(nan,8.089998295173636) P95(nan,8.094999147586819) P99(nan,8.098999829517364) P99.5(nan,8.099499914758681) P99.9(nan,8.099899982951737) P100(nan,8.1)

The /stats/prometheus endpoint is a bit more helpful for seeing individual queue sizes, but the buckets only have an upper bound and include the queue sizes below it:

envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="2"} 1
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="3"} 1
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="4"} 2
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="5"} 3
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="6"} 4
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="7"} 9
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="8"} 10
envoy_listener_cryptomb_rsa_queue_sizes_bucket{envoy_listener_address="0.0.0.0_9000",le="9"} 58667

When this is in the config:

stats_config:
  histogram_bucket_settings:
    match:
      contains: "cryptomb"
    buckets: [2,3,4,5,6,7,8,9]

Would there be an easy way to see individual queue sizes for this histogram, for example, from the /stats endpoint?

@ggreenway
Copy link
Member

Would there be an easy way to see individual queue sizes for this histogram, for example, from the /stats endpoint?

I don't think there is currently. The base /stats endpoint just prints the summary that you see. That could be an interesting feature though, via something like /stats?histogram_buckets=1 or something and print out the bucket counts. It wouldn't be hard to implement (see how it's done in the prometheus exporter; would probably be a handful of lines of code).

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great; just a few nits. Will you follow-up with another PR to add detail to the /stats endpoint for histograms?

ssl_private_key_result_t res_;

// A size for signing and decryption operation input chosen for tests.
static constexpr size_t IN_LEN = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const member variables follow member-variable naming convention, not macro convention, so call these in_len_, in_, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Size of output in out_ from an operation.
size_t out_len_ = 0;

const std::string QUEUE_SIZE_HISTOGRAM_NAME = "cryptomb.rsa_queue_sizes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
@VillePihlava
Copy link
Contributor Author

I can start looking into a follow-up PR on the /stats endpoint.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an issue for the /stats histogram detail follow-up? I'll assign it to you.

I'll pass over to Greg for a senior maintainer pass.

Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ggreenway ggreenway merged commit 5fd31ab into envoyproxy:main Jan 4, 2022
@VillePihlava
Copy link
Contributor Author

The linked pull request #19586 is the follow-up to the discussion in this pull request.

joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
Add queue size statistics to CryptoMb private key provider. The queue size statistics show the amount of simultaneous crypto operations handled at the time of a SIMD (single instruction, multiple data) operation. The statistics consist of a histogram that indicates how often different queue sizes have been processed.

Signed-off-by: Ville Pihlava <ville.pihlava@intel.com>
Signed-off-by: Josh Perry <josh.perry@mx.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants