Skip to content

Drop non-finite samples in Prometheus remote write#144055

Merged
felixbarny merged 4 commits intoelastic:mainfrom
felixbarny:remote-write-nan
Mar 12, 2026
Merged

Drop non-finite samples in Prometheus remote write#144055
felixbarny merged 4 commits intoelastic:mainfrom
felixbarny:remote-write-nan

Conversation

@felixbarny
Copy link
Copy Markdown
Member

Drop non-finite sample values (NaN, +Infinity, -Infinity) in the Prometheus remote write handler before they reach the bulk API.

Prometheus produces non-finite double values in several cases: stale markers (a special NaN signalling that a series has ended), normal NaN (from histogram_quantile with no data, summary quantiles with zero observations, etc.), and +/-Infinity. Elasticsearch's double field mapper rejects these with "[double] supports only finite values", causing the documents to silently land in the failure store. This degrades the data quality score without any user-actionable fix.

This PR adds a Double.isFinite() check in PrometheusRemoteWriteTransportAction to skip non-finite samples before building index requests. When all samples in a request are non-finite, the handler returns 204 (not 400) since these are valid Prometheus values, not client errors.

In the future, we'll need to revisit this to properly store non-finite values. Prometheus defines distinct NaN bit patterns for stale markers (0x7ff0000000000002) vs normal NaN (0x7ff8000000000001), and PromQL queries depend on these semantics for correct staleness detection and NaN-infectious arithmetic. With native PromQL support in ES|QL, silently dropping these values won't be sufficient long-term. This will require adjusting the double field type to accept non-finite values, and carefully thinking through the implications of allowing NaN and Infinity in contexts outside of PromQL, such as standard aggregations, sorting, and range queries.

Made with Cursor

@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.4.0 Team:StorageEngine labels Mar 11, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny felixbarny self-assigned this Mar 11, 2026
felixbarny and others added 3 commits March 12, 2026 08:41
RemoteWriteResponse has no getStatus()/getMessage() methods;
the REST-layer mapping to NO_CONTENT is already verified in
PrometheusRemoteWriteRestActionTests.

Made-with: Cursor
@felixbarny felixbarny merged commit 0f956c1 into elastic:main Mar 12, 2026
36 checks passed
@felixbarny felixbarny deleted the remote-write-nan branch March 12, 2026 11:33
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 12, 2026
…elocations

* upstream/main: (49 commits)
  CCS logging fixes (elastic#144070)
  Improve CPS cluster exclusion handling (elastic#143488)
  Remove snapshot condition now that node_reduce phase is in non-snapshot builds (elastic#144090)
  Drop deprecation warnings when updating a mapping in the cluster state applier (elastic#143884) (elastic#144040)
  Add ensureGreenAndNoInitializingShards helper (elastic#144044)
  Removed unnecessary applies_to blocks from deprecated query (elastic#144096)
  [CPS] Use single CrossProjectModeDecider instance (elastic#144030)
  Fix ESQL TS requests with LIMIT 0 (elastic#144031)
  ESQL: Remove `create` methods in aggs (elastic#144098)
  ES|QL: Refactor ChangeLimitOperator (elastic#144017)
  Add Paginated Hit Source Tests (elastic#142592)
  Fix test failure not preferred (elastic#144019)
  Remove serialization logic from EIS authorization response (elastic#144021)
  ESQL: CSV schema inference and parsing enhancements (elastic#144050)
  ESQL: Fix incorrectly optimized fork with nullify unmapped_fields (elastic#143030)
  Fix MMR release test using subqueries (elastic#144087)
  Refactoring `UserAgentPlugin` (elastic#140712)
  Drop non-finite samples in Prometheus remote write (elastic#144055)
  [TEST] Wait for internal inference indices to be created in authorization IT (elastic#143885)
  Disable ndjson datasource QA tests in release-tests (elastic#143992)
  ...
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/TSDB You know, for Metrics Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants