Skip to content

Fix UnresolvedException on PromQL by(step) grouping#145307

Merged
felixbarny merged 1 commit intoelastic:mainfrom
felixbarny:promql-unresolved-step
Mar 31, 2026
Merged

Fix UnresolvedException on PromQL by(step) grouping#145307
felixbarny merged 1 commit intoelastic:mainfrom
felixbarny:promql-unresolved-step

Conversation

@felixbarny
Copy link
Copy Markdown
Member

@felixbarny felixbarny commented Mar 31, 2026

Fixes an UnresolvedException: Invalid call to dataType on an unresolved object ?step seen on serverless when a PromQL query groups by a label named step, which collides with the built-in step time bucket output column.

  • Guard AcrossSeriesAggregate.output() against calling dataType() on unresolved attributes
  • Reject by(step) groupings in postAnalysisVerification with a clear error message, since the label name collides with the built-in step column
  • Make STEP_COLUMN_NAME private and use stepColumnName() accessor throughout

Guard AcrossSeriesAggregate.output() against calling dataType() on
unresolved attributes, which threw UnresolvedException during
optimization.

Reject PromQL by(step) groupings in postAnalysisVerification because
the label name "step" collides with the built-in step time bucket
output column. A future improvement could add an option to rename the
built-in step column to support this.

Also make STEP_COLUMN_NAME private and use stepColumnName() accessor
throughout.
@felixbarny felixbarny self-assigned this Mar 31, 2026
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine external-contributor Pull request authored by a developer outside the Elasticsearch team labels Mar 31, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny felixbarny requested review from kkrik-es and sidosera March 31, 2026 10:38
"PROMQL index=k8s step=1m result=(sum by (step) (network.eth0.rx))",
"PROMQL index=k8s step=1m result=(sum by (step, pod) (network.eth0.rx))"
)) {
var e = expectThrows(VerificationException.class, () -> planPromql(query));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this converted to 4xx or 5xx? Can't recall where the mapping is defined.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation errors are always client errors (4xx).

failures.add(
fail(
agg,
"label [{}] collides with the built-in [{}] output column [{}]",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine as a stopgap, but I wonder if we can differentiate between the two. Maybe we can implicitly add an alias to the label, to avoid the conflict?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can implicitly add an alias to the label, to avoid the conflict?

What do you have in mind there? That'll still change the name of the output column, so I'm not sure if that'll work.

@felixbarny felixbarny enabled auto-merge (squash) March 31, 2026 11:13
@felixbarny felixbarny merged commit cfb9e82 into elastic:main Mar 31, 2026
39 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (428 commits)
  ESQL: DS: Add inference/RERANK tests (elastic#145229)
  Unmute MMR logical plan test (elastic#145311)
  Do not attempt marking store as corrupted if the check is rejected due to shutdown (elastic#145209)
  feat(tsdb): add pipeline runtime and rename stage interfaces (elastic#145175)
  Fix UnresolvedException on PromQL by(step) grouping (elastic#145307)
  ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014)
  Prometheus labels/series APIs: support multiple match[] selectors (elastic#145298)
  Move ClientScrollablePaginatedHitSource into Reindex Module (elastic#144100)
  mute test class for elastic#145277
  CPS mode for ViewResolver (elastic#145219)
  [ESQL] Disables GroupedTopNBenchmark temporarily (elastic#145124)
  Make exponential_histogram the default histogram type for HTTP OTLP endpoint (elastic#145065)
  More tests requiring an explicit confidence interval (elastic#145232)
  ES|QL: Adding `USER_AGENT` command (elastic#144384)
  ESQL: enable Generative IT after more fixes (elastic#145112)
  Rework FieldMapper parameter tests to not use merge builders (elastic#145213)
  [ESQL] Fix ORC type support gaps (elastic#145074)
  [Test] Unmute FollowingEngineTests.testProcessOnceOnPrimary (elastic#145192)
  Add PrometheusSeriesRestAction for /_prometheus/api/v1/series endpoint (elastic#144494)
  Prometheus labels API: add rest action (elastic#144952)
  ...
@felixbarny felixbarny deleted the promql-unresolved-step branch March 31, 2026 13:37
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
Guard AcrossSeriesAggregate.output() against calling dataType() on
unresolved attributes, which threw UnresolvedException during
optimization.

Reject PromQL by(step) groupings in postAnalysisVerification because
the label name "step" collides with the built-in step time bucket
output column. A future improvement could add an option to rename the
built-in step column to support this.

Also make STEP_COLUMN_NAME private and use stepColumnName() accessor
throughout.
return List.of(timeseriesAttribute);
}
return groupings.stream().filter(not(a -> a.dataType() == DataType.NULL)).toList();
return groupings.stream().filter(a -> a.resolved() == false || a.dataType() != DataType.NULL).toList();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixbarny , I know this is already merged, but this doesn't look right. Unresolved attributes must not be present when output is called.

The javadoc for the output method states:

     * Must be called only on resolved plans, otherwise may throw an exception or return wrong results.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't the primary fix but a defensive check. Maybe too defensive in this case...
Maybe better to fail early or add something like an assert resolved() here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a PR to add the assert: #145525

felixbarny added a commit to felixbarny/elasticsearch that referenced this pull request Apr 2, 2026
…gregate

Encodes the contract from QueryPlan#output() with an assert rather than
silently accepting unresolved attributes in the output stream.

Relates: elastic#145307
felixbarny added a commit that referenced this pull request Apr 2, 2026
…gregate (#145525)

Encodes the contract from QueryPlan#output() with an assert rather than
silently accepting unresolved attributes in the output stream.

Relates: #145307
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
…gregate (elastic#145525)

Encodes the contract from QueryPlan#output() with an assert rather than
silently accepting unresolved attributes in the output stream.

Relates: elastic#145307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/PromQL PromQL support for Elastic Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants