Drop non-finite samples in Prometheus remote write#144055
Merged
felixbarny merged 4 commits intoelastic:mainfrom Mar 12, 2026
Merged
Drop non-finite samples in Prometheus remote write#144055felixbarny merged 4 commits intoelastic:mainfrom
felixbarny merged 4 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
7116f5e to
50aa5eb
Compare
kkrik-es
reviewed
Mar 11, 2026
...us/src/javaRestTest/java/org/elasticsearch/xpack/prometheus/PrometheusRemoteWriteRestIT.java
Show resolved
Hide resolved
kkrik-es
approved these changes
Mar 11, 2026
Made-with: Cursor
RemoteWriteResponse has no getStatus()/getMessage() methods; the REST-layer mapping to NO_CONTENT is already verified in PrometheusRemoteWriteRestActionTests. Made-with: Cursor
szybia
added a commit
to szybia/elasticsearch
that referenced
this pull request
Mar 12, 2026
…elocations * upstream/main: (49 commits) CCS logging fixes (elastic#144070) Improve CPS cluster exclusion handling (elastic#143488) Remove snapshot condition now that node_reduce phase is in non-snapshot builds (elastic#144090) Drop deprecation warnings when updating a mapping in the cluster state applier (elastic#143884) (elastic#144040) Add ensureGreenAndNoInitializingShards helper (elastic#144044) Removed unnecessary applies_to blocks from deprecated query (elastic#144096) [CPS] Use single CrossProjectModeDecider instance (elastic#144030) Fix ESQL TS requests with LIMIT 0 (elastic#144031) ESQL: Remove `create` methods in aggs (elastic#144098) ES|QL: Refactor ChangeLimitOperator (elastic#144017) Add Paginated Hit Source Tests (elastic#142592) Fix test failure not preferred (elastic#144019) Remove serialization logic from EIS authorization response (elastic#144021) ESQL: CSV schema inference and parsing enhancements (elastic#144050) ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (elastic#143030) Fix MMR release test using subqueries (elastic#144087) Refactoring `UserAgentPlugin` (elastic#140712) Drop non-finite samples in Prometheus remote write (elastic#144055) [TEST] Wait for internal inference indices to be created in authorization IT (elastic#143885) Disable ndjson datasource QA tests in release-tests (elastic#143992) ...
michalborek
pushed a commit
to michalborek/elasticsearch
that referenced
this pull request
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Drop non-finite sample values (NaN, +Infinity, -Infinity) in the Prometheus remote write handler before they reach the bulk API.
Prometheus produces non-finite
doublevalues in several cases: stale markers (a special NaN signalling that a series has ended), normal NaN (fromhistogram_quantilewith no data, summary quantiles with zero observations, etc.), and +/-Infinity. Elasticsearch'sdoublefield mapper rejects these with"[double] supports only finite values", causing the documents to silently land in the failure store. This degrades the data quality score without any user-actionable fix.This PR adds a
Double.isFinite()check inPrometheusRemoteWriteTransportActionto skip non-finite samples before building index requests. When all samples in a request are non-finite, the handler returns 204 (not 400) since these are valid Prometheus values, not client errors.In the future, we'll need to revisit this to properly store non-finite values. Prometheus defines distinct NaN bit patterns for stale markers (
0x7ff0000000000002) vs normal NaN (0x7ff8000000000001), and PromQL queries depend on these semantics for correct staleness detection and NaN-infectious arithmetic. With native PromQL support in ES|QL, silently dropping these values won't be sufficient long-term. This will require adjusting thedoublefield type to accept non-finite values, and carefully thinking through the implications of allowing NaN and Infinity in contexts outside of PromQL, such as standard aggregations, sorting, and range queries.Made with Cursor