[GoogleCloudSpannerReceiver] Support for p50, p95 and p99 percentile latencies for queries and transaction statistics by architjugran · Pull Request #16429 · open-telemetry/opentelemetry-collector-contrib

architjugran · 2022-11-22T13:52:48Z

Description: Recently Cloud Spanner Stats tables added Latency distributions which made possible to see the percentiles of latencies using the SPANNER_SYS.DISTRIBUTION_PERCENTILE GoogleSQL function.
Our customers have been demanding the ability to view 50th, 95th, 99th percentile latencies on OpenTelemetry receiver too.
Hence this PR adds the 50th, 95th, 99th percentile latencies metrics, for queries and transactions

Tracking issue : #16430

Testing: Manual end-to-end testing done by running the service locally and verifying the presence of, and correct values of all the metrics added.
Also tested transaction stats percentile metrics both when AVG_TOTAL_LATENCY_SECONDS is NULL and NOT NULL

… query and transactions

varunraiko

LGTM

runforesight · 2022-12-12T15:09:05Z

Foresight Summary

Major Impacts

TestConsumeLogsUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).

TestConsumeTracesUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).

TestLoadInvalidConfig_NoScrapers ❌ failed 2 times in 2 runs (100% fail rate).

View More Details

✅ check-links workflow has finished in 2 minutes 8 seconds (41 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
changed files	- 🔗	N/A	See Details
check-links	- 🔗	N/A	See Details

✅ tracegen workflow has finished in 2 minutes 32 seconds (45 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
publish-latest	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details
build-dev	- 🔗	N/A	See Details

✅ prometheus-compliance-tests workflow has finished in 4 minutes 57 seconds (4 minutes 9 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
prometheus-compliance-tests	- 🔗	✅ 21 ❌ 0 ⏭ 0 🔗	See Details

⭕ build-and-test-windows workflow has finished in 4 seconds (39 minutes 49 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
windows-unittest-matrix	- 🔗	N/A	See Details
windows-unittest	- 🔗	N/A	See Details

✅ build-and-test workflow has finished in 50 minutes 46 seconds (11 minutes 6 seconds less than `main` branch avg.) and finished at 12th Dec, 2022. There are 6 test failures.

Job	Failed Steps	Tests
unittest-matrix (1.18, internal)	- 🔗	✅ 592 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, extension)	- 🔗	✅ 528 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, processor)	- 🔗	✅ 1465 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, internal)	- 🔗	✅ 592 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, processor)	- 🔗	✅ 1465 ❌ 0 ⏭ 0 🔗	See Details
correctness-metrics	- 🔗	✅ 2 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, receiver-0)	- 🔗	✅ 2532 ❌ 1 ⏭ 0 🔗	See Details
correctness-traces	- 🔗	✅ 17 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, other)	- 🔗	✅ 4363 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, exporter)	- 🔗	✅ 2237 ❌ 2 ⏭ 0 🔗	See Details
unittest-matrix (1.19, receiver-0)	- 🔗	✅ 2532 ❌ 1 ⏭ 0 🔗	See Details
unittest-matrix (1.19, exporter)	- 🔗	✅ 2237 ❌ 2 ⏭ 0 🔗	See Details
unittest-matrix (1.19, extension)	- 🔗	✅ 528 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.18, receiver-1)	- 🔗	✅ 1854 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, other)	- 🔗	✅ 4363 ❌ 0 ⏭ 0 🔗	See Details
unittest-matrix (1.19, receiver-1)	- 🔗	✅ 1854 ❌ 0 ⏭ 0 🔗	See Details
integration-tests	- 🔗	✅ 53 ❌ 0 ⏭ 0 🔗	See Details
setup-environment	- 🔗	N/A	See Details
check-collector-module-version	- 🔗	N/A	See Details
check-codeowners	- 🔗	N/A	See Details
checks	- 🔗	N/A	See Details
lint-matrix (receiver-1)	- 🔗	N/A	See Details
lint-matrix (receiver-0)	- 🔗	N/A	See Details
lint-matrix (processor)	- 🔗	N/A	See Details
lint-matrix (exporter)	- 🔗	N/A	See Details
lint-matrix (extension)	- 🔗	N/A	See Details
lint-matrix (internal)	- 🔗	N/A	See Details
lint-matrix (other)	- 🔗	N/A	See Details
build-examples	- 🔗	N/A	See Details
lint	- 🔗	N/A	See Details
unittest (1.19)	- 🔗	N/A	See Details
unittest (1.18)	- 🔗	N/A	See Details
cross-compile (darwin, amd64)	- 🔗	N/A	See Details
cross-compile (darwin, arm64)	- 🔗	N/A	See Details
cross-compile (linux, 386)	- 🔗	N/A	See Details
cross-compile (linux, amd64)	- 🔗	N/A	See Details
cross-compile (linux, arm)	- 🔗	N/A	See Details
cross-compile (linux, arm64)	- 🔗	N/A	See Details
cross-compile (linux, ppc64le)	- 🔗	N/A	See Details
cross-compile (windows, 386)	- 🔗	N/A	See Details
cross-compile (windows, amd64)	- 🔗	N/A	See Details
build-package (deb)	- 🔗	N/A	See Details
build-package (rpm)	- 🔗	N/A	See Details
windows-msi	- 🔗	N/A	See Details
publish-check	- 🔗	N/A	See Details
publish-stable	- 🔗	N/A	See Details

✅ load-tests workflow has finished in 22 minutes 12 seconds (⚠️ 6 minutes 12 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
loadtest (TestIdleMode)	- 🔗	✅ 1 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceAttributesProcessor)	- 🔗	✅ 3 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetric10kDPS\|TestMetricsFromFile)	- 🔗	✅ 6 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestMetricResourceProcessor\|TestTrace10kSPS)	- 🔗	✅ 12 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceBallast1kSPSWithAttrs\|TestTraceBallast1kSPSAddAttrs)	- 🔗	✅ 10 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestTraceNoBackend10kSPS\|TestTrace1kSPSWithAttrs)	- 🔗	✅ 8 ❌ 0 ⏭ 0 🔗	See Details
loadtest (TestBallastMemory\|TestLog10kDPS)	- 🔗	✅ 19 ❌ 0 ⏭ 0 🔗	See Details
setup-environment	- 🔗	N/A	See Details

✅ changelog workflow has finished in 17 minutes 31 seconds (⚠️ 12 minutes 11 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.

Job	Failed Steps	Tests
changelog	- 🔗	N/A	See Details

🔎 See details on Foresight

^{*You can configure Foresight comments in your organization settings page.}

…metheus Two upstream changes between the previous baseline (prometheus v0.300.1) and v0.150's pin (prometheus v0.311.2) require adjustments to the fork's programmatic ScrapeConfig construction and the EndToEnd tests that exercise it. 1. Reload() YAML round-trip is stricter. Upstream contrib added a Reload() call inside newPrometheusReceiver() that round-trips the receiver's ScrapeConfig through YAML on startup (Prometheus open-telemetry#16429 raised the prometheus-side strictness). Programmatic ScrapeConfigs that round-trip through YAML must conform to two shape requirements they did not previously need: - kubernetes.SDConfig with zero-value HTTPClientConfig fails validation because FollowRedirects:false / EnableHTTP2:false differ from DefaultHTTPClientConfig, triggering "to use custom HTTP client configuration please provide the api_server URL explicitly". Set HTTPClientConfig: configutil.DefaultHTTPClientConfig on all such structs (DCGM, Neuron, NVMe EBS, NVMe LIS, Kueue). Effective behavior is unchanged: the SDK applied the same defaults internally for non-programmatic configs. - *discovery.StaticConfig (pointer) fails marshaling because the Prometheus discovery registry handles StaticConfig as a value type, not a pointer. The type assertion c.(StaticConfig) fails for *StaticConfig, producing "cannot marshal unregistered Config type: *discovery.StaticConfig". Use discovery.StaticConfig{} (value) instead. StaticConfig is a slice type so the & was unnecessary. Applied to production scrapers and matching test fixtures. 2. Per-series staleness emission is now keyed by appender SeriesRef rather than labels-hash (Prometheus PR open-telemetry#17411 + contrib fixes open-telemetry#43925/open-telemetry#46589 that returned stable refs from Append/AppendHistogram). This shifts the timing of failed-scrape commit such that on the mock prometheus's deliberate 200 -> 404 sequence, the 404 commit now lands inside the test window. Consumers therefore receive a second batch carrying up=0 plus per-series staleness markers (NumberDataPointValueTypeEmpty), which trips per-call value assertions in the EndToEnd tests. This is intended upstream behavior. Production consumers in CWA handle it correctly: prometheusremotewriteexporter converts NoRecordedValue to StaleNaN for AMP; awsemfexporter drops stale datapoints (grouped_metric.go, datapoint.go); the telegraf prometheus input opts out by returning ref=0 from Append. Update mockConsumers to skip failed-or-stale follow-up batches via a content-based check (up=0 or all-stale-markers). Keep the check in a single shared helper IsFailedOrStaleScrape on awscontainerinsightreceiver/internal/prometheusscraper/, with a small duplicate in the awscontainerinsightskueuereceiver test to avoid a cross-module dependency. The dcgm test already used assert.Eventually(consumerCalled, ...) which short-circuits on first success; no change there. Touches receiver/awscontainerinsightreceiver and receiver/awscontainerinsightskueuereceiver.

Added support for exporting 50th, 95th, 99th percentile latencies for…

1bfe490

… query and transactions

architjugran requested review from a team and djaglowski November 22, 2022 13:52

added chlog file

963551c

github-actions Bot assigned evan-bradley Nov 22, 2022

github-actions Bot added the receiver/googlecloudspanner label Nov 22, 2022

added chlog file

f0746f3

varunraiko approved these changes Nov 29, 2022

View reviewed changes

evan-bradley approved these changes Nov 29, 2022

View reviewed changes

Comment thread receiver/googlecloudspannerreceiver/internal/metadataconfig/metadata.yaml Outdated

Comment thread .chloggen/googlecloudspannerreceiver-percentile-latencies-feature.yaml Outdated

Addressed nit comments

6076e3f

dashpole approved these changes Dec 12, 2022

View reviewed changes

dashpole added the ready to merge Code review completed; ready to merge by maintainers label Dec 12, 2022

codeboten approved these changes Dec 13, 2022

View reviewed changes

codeboten merged commit 0af9ff5 into open-telemetry:main Dec 13, 2022

fatsheep9146 mentioned this pull request Feb 1, 2023

Support for query and transaction percentile-latency statistics #16430

Closed

plantfansam mentioned this pull request Jul 21, 2023

disregard #24427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GoogleCloudSpannerReceiver] Support for p50, p95 and p99 percentile latencies for queries and transaction statistics#16429

[GoogleCloudSpannerReceiver] Support for p50, p95 and p99 percentile latencies for queries and transaction statistics#16429
codeboten merged 4 commits into
open-telemetry:mainfrom
architjugran:main

architjugran commented Nov 22, 2022 •

edited

Loading

Uh oh!

varunraiko left a comment

Uh oh!

Uh oh!

Uh oh!

runforesight Bot commented Dec 12, 2022 •

edited

Loading

`TestConsumeLogsUnexpectedExporterType` ❌ failed 2 times in 2 runs (100% fail rate).

`TestConsumeTracesUnexpectedExporterType` ❌ failed 2 times in 2 runs (100% fail rate).

`TestLoadInvalidConfig_NoScrapers` ❌ failed 2 times in 2 runs (100% fail rate).

✅ check-links workflow has finished in 2 minutes 8 seconds (41 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ tracegen workflow has finished in 2 minutes 32 seconds (45 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ prometheus-compliance-tests workflow has finished in 4 minutes 57 seconds (4 minutes 9 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

⭕ build-and-test-windows workflow has finished in 4 seconds (39 minutes 49 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ build-and-test workflow has finished in 50 minutes 46 seconds (11 minutes 6 seconds less than `main` branch avg.) and finished at 12th Dec, 2022. There are 6 test failures.

✅ load-tests workflow has finished in 22 minutes 12 seconds (⚠️ 6 minutes 12 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.

✅ changelog workflow has finished in 17 minutes 31 seconds (⚠️ 12 minutes 11 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

architjugran commented Nov 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

varunraiko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

runforesight Bot commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Foresight Summary

TestConsumeLogsUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).

TestConsumeTracesUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).

TestLoadInvalidConfig_NoScrapers ❌ failed 2 times in 2 runs (100% fail rate).

✅ check-links workflow has finished in 2 minutes 8 seconds (41 seconds less than main branch avg.) and finished at 12th Dec, 2022.

✅ tracegen workflow has finished in 2 minutes 32 seconds (45 seconds less than main branch avg.) and finished at 12th Dec, 2022.

✅ prometheus-compliance-tests workflow has finished in 4 minutes 57 seconds (4 minutes 9 seconds less than main branch avg.) and finished at 12th Dec, 2022.

⭕ build-and-test-windows workflow has finished in 4 seconds (39 minutes 49 seconds less than main branch avg.) and finished at 12th Dec, 2022.

✅ build-and-test workflow has finished in 50 minutes 46 seconds (11 minutes 6 seconds less than main branch avg.) and finished at 12th Dec, 2022. There are 6 test failures.

✅ load-tests workflow has finished in 22 minutes 12 seconds (⚠️ 6 minutes 12 seconds more than main branch avg.) and finished at 12th Dec, 2022.

✅ changelog workflow has finished in 17 minutes 31 seconds (⚠️ 12 minutes 11 seconds more than main branch avg.) and finished at 12th Dec, 2022.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

architjugran commented Nov 22, 2022 •

edited

Loading

runforesight Bot commented Dec 12, 2022 •

edited

Loading

`TestConsumeLogsUnexpectedExporterType` ❌ failed 2 times in 2 runs (100% fail rate).

`TestConsumeTracesUnexpectedExporterType` ❌ failed 2 times in 2 runs (100% fail rate).

`TestLoadInvalidConfig_NoScrapers` ❌ failed 2 times in 2 runs (100% fail rate).

✅ check-links workflow has finished in 2 minutes 8 seconds (41 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ tracegen workflow has finished in 2 minutes 32 seconds (45 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ prometheus-compliance-tests workflow has finished in 4 minutes 57 seconds (4 minutes 9 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

⭕ build-and-test-windows workflow has finished in 4 seconds (39 minutes 49 seconds less than `main` branch avg.) and finished at 12th Dec, 2022.

✅ build-and-test workflow has finished in 50 minutes 46 seconds (11 minutes 6 seconds less than `main` branch avg.) and finished at 12th Dec, 2022. There are 6 test failures.

✅ load-tests workflow has finished in 22 minutes 12 seconds (⚠️ 6 minutes 12 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.

✅ changelog workflow has finished in 17 minutes 31 seconds (⚠️ 12 minutes 11 seconds more than `main` branch avg.) and finished at 12th Dec, 2022.