Skip to content

[GoogleCloudSpannerReceiver] Support for p50, p95 and p99 percentile latencies for queries and transaction statistics#16429

Merged
codeboten merged 4 commits into
open-telemetry:mainfrom
architjugran:main
Dec 13, 2022
Merged

[GoogleCloudSpannerReceiver] Support for p50, p95 and p99 percentile latencies for queries and transaction statistics#16429
codeboten merged 4 commits into
open-telemetry:mainfrom
architjugran:main

Conversation

@architjugran

@architjugran architjugran commented Nov 22, 2022

Copy link
Copy Markdown
Contributor

Description: Recently Cloud Spanner Stats tables added Latency distributions which made possible to see the percentiles of latencies using the SPANNER_SYS.DISTRIBUTION_PERCENTILE GoogleSQL function.
Our customers have been demanding the ability to view 50th, 95th, 99th percentile latencies on OpenTelemetry receiver too.
Hence this PR adds the 50th, 95th, 99th percentile latencies metrics, for queries and transactions

Tracking issue : #16430

Testing: Manual end-to-end testing done by running the service locally and verifying the presence of, and correct values of all the metrics added.
Also tested transaction stats percentile metrics both when AVG_TOTAL_LATENCY_SECONDS is NULL and NOT NULL

@architjugran architjugran requested review from a team and djaglowski November 22, 2022 13:52

@varunraiko varunraiko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread receiver/googlecloudspannerreceiver/internal/metadataconfig/metadata.yaml Outdated
Comment thread .chloggen/googlecloudspannerreceiver-percentile-latencies-feature.yaml Outdated
@runforesight

runforesight Bot commented Dec 12, 2022

Copy link
Copy Markdown

Foresight Summary

    
Major Impacts

TestConsumeLogsUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).
TestConsumeTracesUnexpectedExporterType ❌ failed 2 times in 2 runs (100% fail rate).
TestLoadInvalidConfig_NoScrapers ❌ failed 2 times in 2 runs (100% fail rate).
View More Details

✅  check-links workflow has finished in 2 minutes 8 seconds (41 seconds less than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  tracegen workflow has finished in 2 minutes 32 seconds (45 seconds less than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
build-dev -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 4 minutes 57 seconds (4 minutes 9 seconds less than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

⭕  build-and-test-windows workflow has finished in 4 seconds (39 minutes 49 seconds less than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  build-and-test workflow has finished in 50 minutes 46 seconds (11 minutes 6 seconds less than main branch avg.) and finished at 12th Dec, 2022. There are 6 test failures.


Job Failed Steps Tests
unittest-matrix (1.18, internal) -     🔗  ✅ 592  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, processor) -     🔗  ✅ 1465  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, internal) -     🔗  ✅ 592  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, processor) -     🔗  ✅ 1465  ❌ 0  ⏭ 0    🔗 See Details
correctness-metrics -     🔗  ✅ 2  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-0) -     🔗  ✅ 2532  ❌ 1  ⏭ 0    🔗 See Details
correctness-traces -     🔗  ✅ 17  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, other) -     🔗  ✅ 4363  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, exporter) -     🔗  ✅ 2237  ❌ 2  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-0) -     🔗  ✅ 2532  ❌ 1  ⏭ 0    🔗 See Details
unittest-matrix (1.19, exporter) -     🔗  ✅ 2237  ❌ 2  ⏭ 0    🔗 See Details
unittest-matrix (1.19, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-1) -     🔗  ✅ 1854  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) -     🔗  ✅ 4363  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-1) -     🔗  ✅ 1854  ❌ 0  ⏭ 0    🔗 See Details
integration-tests -     🔗  ✅ 53  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
checks -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
unittest (1.18) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  load-tests workflow has finished in 22 minutes 12 seconds (⚠️ 6 minutes 12 seconds more than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 19  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

✅  changelog workflow has finished in 17 minutes 31 seconds (⚠️ 12 minutes 11 seconds more than main branch avg.) and finished at 12th Dec, 2022.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

@dashpole dashpole added the ready to merge Code review completed; ready to merge by maintainers label Dec 12, 2022
@codeboten codeboten merged commit 0af9ff5 into open-telemetry:main Dec 13, 2022
@plantfansam plantfansam mentioned this pull request Jul 21, 2023
sky333999 added a commit to amazon-contributing/opentelemetry-collector-contrib that referenced this pull request Jun 7, 2026
…metheus

Two upstream changes between the previous baseline (prometheus
v0.300.1) and v0.150's pin (prometheus v0.311.2) require
adjustments to the fork's programmatic ScrapeConfig construction
and the EndToEnd tests that exercise it.

1. Reload() YAML round-trip is stricter. Upstream contrib added a
   Reload() call inside newPrometheusReceiver() that round-trips
   the receiver's ScrapeConfig through YAML on startup
   (Prometheus open-telemetry#16429 raised the prometheus-side strictness).
   Programmatic ScrapeConfigs that round-trip through YAML must
   conform to two shape requirements they did not previously need:

   - kubernetes.SDConfig with zero-value HTTPClientConfig fails
     validation because FollowRedirects:false / EnableHTTP2:false
     differ from DefaultHTTPClientConfig, triggering "to use custom
     HTTP client configuration please provide the api_server URL
     explicitly". Set HTTPClientConfig: configutil.DefaultHTTPClientConfig
     on all such structs (DCGM, Neuron, NVMe EBS, NVMe LIS, Kueue).
     Effective behavior is unchanged: the SDK applied the same
     defaults internally for non-programmatic configs.

   - *discovery.StaticConfig (pointer) fails marshaling because the
     Prometheus discovery registry handles StaticConfig as a value
     type, not a pointer. The type assertion c.(StaticConfig) fails
     for *StaticConfig, producing "cannot marshal unregistered
     Config type: *discovery.StaticConfig". Use discovery.StaticConfig{}
     (value) instead. StaticConfig is a slice type so the & was
     unnecessary. Applied to production scrapers and matching test
     fixtures.

2. Per-series staleness emission is now keyed by appender SeriesRef
   rather than labels-hash (Prometheus PR open-telemetry#17411 + contrib fixes
   open-telemetry#43925/open-telemetry#46589 that returned stable refs from Append/AppendHistogram).
   This shifts the timing of failed-scrape commit such that on the
   mock prometheus's deliberate 200 -> 404 sequence, the 404 commit
   now lands inside the test window. Consumers therefore receive a
   second batch carrying up=0 plus per-series staleness markers
   (NumberDataPointValueTypeEmpty), which trips per-call value
   assertions in the EndToEnd tests.

   This is intended upstream behavior. Production consumers in CWA
   handle it correctly: prometheusremotewriteexporter converts
   NoRecordedValue to StaleNaN for AMP; awsemfexporter drops stale
   datapoints (grouped_metric.go, datapoint.go); the telegraf
   prometheus input opts out by returning ref=0 from Append.

   Update mockConsumers to skip failed-or-stale follow-up batches
   via a content-based check (up=0 or all-stale-markers). Keep the
   check in a single shared helper IsFailedOrStaleScrape on
   awscontainerinsightreceiver/internal/prometheusscraper/, with a
   small duplicate in the awscontainerinsightskueuereceiver test to
   avoid a cross-module dependency. The dcgm test already used
   assert.Eventually(consumerCalled, ...) which short-circuits on
   first success; no change there.

Touches receiver/awscontainerinsightreceiver and
receiver/awscontainerinsightskueuereceiver.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready to merge Code review completed; ready to merge by maintainers receiver/googlecloudspanner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants