Skip to content

[ESQL] Push stats to external source via metadata#143940

Merged
costin merged 2 commits intoelastic:mainfrom
costin:esql/push-stats-external-source
Mar 11, 2026
Merged

[ESQL] Push stats to external source via metadata#143940
costin merged 2 commits intoelastic:mainfrom
costin:esql/push-stats-external-source

Conversation

@costin
Copy link
Copy Markdown
Member

@costin costin commented Mar 10, 2026

Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.

Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.
@costin costin added >enhancement :Analytics/ES|QL AKA ESQL ES|QL|DS ES|QL datasources labels Mar 10, 2026
@costin costin requested a review from bpintea March 10, 2026 12:22
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0 labels Mar 10, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @costin, I've created a changelog YAML for you.

@costin costin enabled auto-merge (squash) March 10, 2026 12:32
Copy link
Copy Markdown
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖-assisted review.

Comment on lines +146 to +160
if (value instanceof Long l) {
blocks[i] = blockFactory.newConstantLongBlockWith(l, 1);
} else if (value instanceof Integer n) {
blocks[i] = blockFactory.newConstantIntBlockWith(n, 1);
} else if (value instanceof Double d) {
blocks[i] = blockFactory.newConstantDoubleBlockWith(d, 1);
} else if (value instanceof Boolean b) {
blocks[i] = blockFactory.newConstantBooleanBlockWith(b, 1);
} else if (value instanceof String s) {
blocks[i] = blockFactory.newConstantBytesRefBlockWith(new org.apache.lucene.util.BytesRef(s), 1);
} else if (value instanceof Number n) {
blocks[i] = blockFactory.newConstantLongBlockWith(n.longValue(), 1);
} else {
blocks[i] = blockFactory.newConstantNullBlock(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: switch?

@costin costin merged commit 130b1e5 into elastic:main Mar 11, 2026
36 checks passed
@costin costin deleted the esql/push-stats-external-source branch March 11, 2026 17:52
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 11, 2026
…elocations

* upstream/main: (54 commits)
  [ES|QL|DS] Wire parallel parsing into production for text formats (elastic#143997)
  ESQL: Allow EXTERNAL commands be run part of the CsvTests suite (elastic#143970)
  [ESQL] Push stats to external source via metadata (elastic#143940)
  Mute org.elasticsearch.xpack.esql.CsvIT test {csv-spec:approximation.Approximate stats with stats where} elastic#144051
  Refactored SortedNumericDocValuesSyntheticFieldLoader into a Layer (elastic#143912)
  Enable extended doc_values params feature flag in RandomizedRollingUpgradeIT (elastic#143918)
  Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {csv-spec:approximation.Approximate stats with sample} elastic#144022
  Ensure we use float values for rolling upgrade float vectors (elastic#144032)
  Remove sensitive info from reindex task description (elastic#143635)
  Fix HistogramUnionState.equals (elastic#143990)
  Use dedicated IndexRouting API in ShardSplittingQuery (elastic#143776)
  Engine/Store DistributedArchitectureGuide doc (elastic#143818)
  Mute org.elasticsearch.snapshots.ConcurrentSnapshotsIT testDeletesAreBatched elastic#144034
  Avoid serializing exceptions as JSON in remote write endpoint (elastic#143987)
  allow testLoadDocSequenceReturnsCorrectResultsText to circuit break, it happens in serverless occasionally (elastic#144023)
  [ESQL] Adds memory accounting to GroupedLimitOperator (elastic#143941)
  Adjust ESIntegTestCase.getLiveDocs method to account for pruned sequence numbers (elastic#143999)
  Support target bucket count in `TBUCKET` with explicit from/to date range (elastic#142747)
  TSDBDocValuesFormatSingleNodeTests with and without synthetic id (elastic#144002)
  Fix circuit breaker leak in BreakingTDigestHolder (elastic#143873)
  ...
jdconrad pushed a commit to jdconrad/elasticsearch that referenced this pull request Mar 11, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
jdconrad pushed a commit to jdconrad/elasticsearch that referenced this pull request Mar 11, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
jdconrad pushed a commit to jdconrad/elasticsearch that referenced this pull request Mar 11, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
jdconrad pushed a commit to jdconrad/elasticsearch that referenced this pull request Mar 11, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
jdconrad pushed a commit to jdconrad/elasticsearch that referenced this pull request Mar 11, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Add PushStatsToExternalSource optimizer rule that answers
ungrouped COUNT(*), COUNT(field), MIN(field), MAX(field)
from file metadata statistics without scanning data.

Parquet and ORC format readers now extract row counts,
null counts, and column min/max from file metadata.
Statistics flow through the sourceMetadata map using
SourceStatisticsSerializer, avoiding serialization changes.

Developed using AI-assisted tooling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants