Skip to content

ESQL: Add name IDs to golden tests and fix synthetic names#143450

Merged
GalLalouche merged 3 commits intoelastic:mainfrom
GalLalouche:tests/golden_name_IDs
Mar 3, 2026
Merged

ESQL: Add name IDs to golden tests and fix synthetic names#143450
GalLalouche merged 3 commits intoelastic:mainfrom
GalLalouche:tests/golden_name_IDs

Conversation

@GalLalouche
Copy link
Copy Markdown
Contributor

@GalLalouche GalLalouche commented Mar 3, 2026

PR Summary

This PR adds support for two golden tests features in ESQL:

  1. Name IDs are no longer removed, but are instead normalized, so golden tests can be used to track name IDs.
  2. Fixed a bug in synthetic names not being normalized, which could lead to false positives.

Existing discrepancies

Note: There are discrepancies in some of name IDs (Summary by Cursor). @alex-spies, I believe you encountered this in your fixes to #142489!

What the inconsistency is

  • In several SubstituteRoundToGoldenTests local physical plans, the same intermediate aggregate fields appear with different NameIds across an ExchangeExec boundary.
  • Typical pattern:
    • ExchangeExec output uses IDs like #2/#3
    • direct child (often EsStatsQueryExec) exposes the same logical fields as #4/#5
  • AggregateExec FINAL then follows the ExchangeExec side IDs, so the mismatch looks like an Exchange boundary drift.

Why this happens (production path)

  • This is not a golden-print normalization issue; it is produced by plan construction.
  • The split-agg path creates intermediate attributes more than once in different planning phases:
    1. Mapper builds split aggregate intermediates and stores them on ExchangeExec output.
    2. During local planning (PlannerUtils.localPlan), fragment replanning (LocalMapper) rebuilds intermediate attrs, allocating fresh NameIds.
  • ExchangeExec.replaceChild(...) keeps its existing output list, so parent-side IDs and child-side IDs can diverge.

Why it still exists

  • The plan remains executable, so this drift can survive verification and only shows up as consistency noise in golden plan structure.
  • There is also an explicit TODO in PushStatsToSource noting recreated attributes can get wrong NameIds, which indicates this class of issue is known in planning code.

Why it is not a trivial fix

  • A naive fix (for example, always forcing ExchangeExec output to child output IDs) can break split-aggregation runtime semantics.
  • We already saw this kind of direct wiring change cause spec IT failures (including TopIpsGrouping) and assertion failures.
  • A safe fix needs careful ID propagation/reconciliation across planner phases (Mapper/local mapper/exchange boundaries), not just a 1-2 line tweak.
  • Any robust change likely touches planner identity flow and can cause broad golden churn, so it needs targeted validation.

@GalLalouche GalLalouche requested a review from alex-spies March 3, 2026 10:02
@GalLalouche GalLalouche added >test Issues or PRs that are addressing/adding tests Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Mar 3, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@GalLalouche GalLalouche force-pushed the tests/golden_name_IDs branch from 2912e16 to 256fb47 Compare March 3, 2026 10:10
Copy link
Copy Markdown
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Gal!

@GalLalouche GalLalouche enabled auto-merge (squash) March 3, 2026 15:42
@GalLalouche GalLalouche merged commit c1c3b19 into elastic:main Mar 3, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 3, 2026
…locations

* upstream/main: (51 commits)
  ESQL: Remaining serialization tests (elastic#143470)
  Eagerly release resources in `TransportAwaitClusterStateVersionAppliedAction` (elastic#143477)
  Stop and relocate sliced reindex on shutdown (elastic#143183)
  Documentation for query_vector base64 parameter (elastic#142675)
  ES|QL: Fix LIMIT after all columns are dropped (elastic#143463)
  Update docs-build.yml (elastic#142958)
  Fix KnnIndexTester to work with byte vectors (elastic#143493)
  Fix IndexInputUtils.withSlice to produce native-safe MemorySegments on Java 21 (elastic#143479)
  CPS fix: include only relevant projects in the search response metadata (elastic#143367)
  apm-data: explicit map of timestamp.us to long (elastic#143173)
  [Inference API] Add custom headers for Azure OpenAI Service (elastic#142969)
  ESQL: Add name IDs to golden tests and fix synthetic names (elastic#143450)
  Add getUnavailableShards to BaseBroadcastResponse (elastic#143406)
  Add description to reindex API without sensitive info (elastic#143112)
  SQL: fix CLI tests (elastic#143451)
  ES|QL: Add note of future removal of FORK implicit LIMIT (elastic#143457)
  [Test] Randomly disable doc values skippers in time-series indices (elastic#143389)
  Improve pattern text downgrade license test (elastic#143102)
  [Transform] Stop transforms at the end of tests (elastic#139783)
  Mute org.elasticsearch.compute.lucene.read.ValueSourceReaderTypeConversionTests testLoadAll elastic#143471
  ...
shmuelhanoch pushed a commit to shmuelhanoch/elasticsearch that referenced this pull request Mar 4, 2026
…43450)

This PR adds support for two golden tests features in ESQL:

Name IDs are no longer removed, but are instead normalized, so golden tests can be used to track name IDs.
Fixed a bug in synthetic names not being normalized, which could lead to false positives.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test Issues or PRs that are addressing/adding tests v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants