[ESQL] Fix ORC type support gaps by costin · Pull Request #145074 · elastic/elasticsearch

costin · 2026-03-27T09:29:17Z

Fix three ORC type mapping issues in the format reader:

DECIMAL columns (all precisions) crashed with
QlIllegalArgumentException because DecimalColumnVector
was not handled in the double block path. Add support
for both DecimalColumnVector and Decimal64ColumnVector.
TIMESTAMP/TIMESTAMP_INSTANT remain mapped to DATETIME
(millisecond precision). Mapping to DATE_NANOS was
investigated but DATE_NANOS cannot represent pre-epoch
dates (assert nanos >= 0 in DateUtils), which crashes
on ORC files with historical timestamps. Nanosecond
precision is deferred until the DATE_NANOS range
limitation is addressed.
BINARY columns now map to UNSUPPORTED instead of KEYWORD
to avoid exposing raw bytes as garbled text.

Also adds a shared csv-spec regression test that filters on
a pre-epoch birth date, protecting against re-introducing the
DATE_NANOS mapping without fixing the range limitation.

Developed with AI-assisted tooling

Fix three ORC type mapping issues in the format reader: - DECIMAL columns (all precisions) crashed with QlIllegalArgumentException because DecimalColumnVector was not handled in the double block path. Add support for both DecimalColumnVector and Decimal64ColumnVector. - TIMESTAMP/TIMESTAMP_INSTANT now map to DATE_NANOS to preserve full nanosecond precision. The sub-millisecond nanos[] array was previously ignored, silently dropping up to 999,999 nanoseconds per timestamp. - BINARY columns now map to UNSUPPORTED instead of KEYWORD to avoid exposing raw bytes as garbled text.

elasticsearchmachine · 2026-03-27T09:29:41Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2026-03-27T09:29:41Z

Hi @costin, I've created a changelog YAML for you.

Accept DATE_NANOS as compatible with DATETIME in test type assertions, matching the existing INTEGER/LONG coercion pattern. External source formats like ORC that preserve nanosecond precision map timestamps to DATE_NANOS, but the shared csv-spec files declare columns as :date (DATETIME).

bpintea

Left some questions.

bpintea · 2026-03-27T14:33:31Z

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvAssert.java

            if (actualType == Type.INTEGER && expectedType == Type.LONG) {
                actualType = Type.LONG;
            }
+            if (actualType == Type.DATE_NANOS && expectedType == Type.DATETIME) {


Above we differentiate between millis and nanos, but not in the CSV tests. Should we?

Good question. The csv-spec files are shared across all external source formats (CSV, NdJSON, ORC, Parquet) and declare timestamp columns as :date (DATETIME). Since ORC preserves nanos natively, its reader returns DATE_NANOS — a strictly higher-precision form of DATETIME.

The coercion follows the existing INTEGER→LONG pattern (line above): it only accepts DATE_NANOS where DATETIME is expected, never the reverse. A test that explicitly asserts :date_nanos in its header still works.

The alternative would be duplicating the entire spec file set per format, which doesn't seem worth it. Added a comment explaining the rationale.

I feel like if a storage returns DATE_NANOS, we should assert that, not update the type. Otherwise this still has the potential to mask bugs.

Agreed. Removed the coercion entirely — ORC now returns DATETIME like the other formats, so there's nothing to mask.

bpintea · 2026-03-27T14:42:55Z

...atasource-orc/src/main/java/org/elasticsearch/xpack/esql/datasource/orc/OrcFormatReader.java

+                            long nanos;
+                            if (child instanceof TimestampColumnVector ts) {
+                                if (ts.noNulls == false && ts.isNull[idx]) {
+                                    nanos = 0L;


Is this right? We add a 0L for a null in the source? And same in the 2x else branches.
Should we add some tests for these?

Good catch. Two things here:

The null→0L pattern: this is inherited from all other list block methods (createListIntBlock appends 0, createListLongBlock appends 0L, createListDatetimeBlock appends 0L). Fixing it properly requires a design decision about whether multi-value positions can contain a mix of null and non-null elements — that's a broader change I'd rather do separately across all list methods.

The catch-all else branch: you're right, this was dead code silently producing epoch-zero garbage. Changed both createListDateNanosBlock and createListDatetimeBlock to throw QlIllegalArgumentException instead, consistent with createDoubleBlock.

bpintea · 2026-03-27T14:43:17Z

...atasource-orc/src/main/java/org/elasticsearch/xpack/esql/datasource/orc/OrcFormatReader.java

+                                    nanos = lv.vector[idx] * NANOS_PER_DAY;
+                                }
+                            } else {
+                                nanos = 0L;


Also here, what other type could it be?

For DATE_NANOS, only TimestampColumnVector should appear (TIMESTAMP and TIMESTAMP_INSTANT). The LongColumnVector branch is defensive (DATE uses LongColumnVector but maps to DATETIME, not DATE_NANOS, so it shouldn't reach this method in practice).

Changed the else to throw QlIllegalArgumentException instead of silently producing 0L — reaching it would indicate a bug. Same change applied to createListDatetimeBlock for consistency.

- Add explanatory comment for DATE_NANOS/DATETIME coercion in CsvAssert, clarifying the rationale. - Replace silent else catch-alls with QlIllegalArgumentException in createListDateNanosBlock and createListDatetimeBlock, consistent with createDoubleBlock.

bpintea · 2026-03-30T13:06:46Z

...atasource-orc/src/main/java/org/elasticsearch/xpack/esql/datasource/orc/OrcFormatReader.java

            case VARCHAR, CHAR -> DataType.KEYWORD;
-            case BINARY -> DataType.KEYWORD;
-            case TIMESTAMP, TIMESTAMP_INSTANT, DATE -> DataType.DATETIME;
+            case TIMESTAMP, TIMESTAMP_INSTANT -> DataType.DATE_NANOS;


Going back to this, not positive on the right solution here. Our DATE_NANOS have limited range (Epoch to MAX_NANOSECOND_INSTANT). Which means we won't be able to decode dates outside of that, should they appear in files.

We should probably add unit tests with such dates/timestamps.

I've extended external-basic.csv-spec with a test that passes in, for instance Parquet, but not ORC after applying the PR:

readAllEmployees2 required_capability: external_command EXTERNAL "{{employees}}" | KEEP emp_no, first_name, last_name, birth_date, gender, hire_date, languages, height, salary, still_hired | WHERE birth_date == TO_DATETIME("1953-09-02T00:00:00.000Z") | SORT emp_no | LIMIT 5; emp_no:integer | first_name:keyword | last_name:keyword | birth_date:date | gender:keyword | hire_date:date | languages:integer | height:double | salary:integer | still_hired:boolean 10001 | "Georgi" | "Facello" | 1953-09-02T00:00:00.000Z | "M" | 1986-06-26T00:00:00.000Z | 2 | 2.03 | 57305 | true ;

with

[ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [test-cluster-0] fatal error in thread [elasticsearch[test-cluster-0][esql_worker][T#3]], exiting java.lang.AssertionError at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.time.DateUtils.compareNanosToMillis(DateUtils.java:339) at org.elasticsearch.xpack.esql.expression.predicate.operator.comparison.Equals.processNanosMillis(Equals.java:222) at org.elasticsearch.xpack.esql.expression.predicate.operator.comparison.EqualsNanosMillisEvaluator.eval(EqualsNanosMillisEvaluator.java:100) at org.elasticsearch.xpack.esql.expression.predicate.operator.comparison.EqualsNanosMillisEvaluator.eval(EqualsNanosMillisEvaluator.java:54) at org.elasticsearch.compute.operator.FilterOperator.process(FilterOperator.java:43) at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:89) at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:323) at org.elasticsearch.compute.operator.Driver.run(Driver.java:194) at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:506) at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) at org.elasticsearch.compute.operator.DriverScheduler$1.doRun(DriverScheduler.java:57) at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1114) at org.elasticsearch.server@9.4.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1575) [2026-03-30T23:39:36,619][INFO ][o.e.x.e.q.o.OrcFormatSpecIT][test] [csv-spec:external-basic.readAllEmployees2 [LOCAL]] after test

Reverted TIMESTAMP/TIMESTAMP_INSTANT back to DATETIME. DATE_NANOS can't represent pre-epoch dates so it's not viable here until that limitation is addressed separately.

bpintea · 2026-03-30T13:07:50Z

x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvAssert.java

            if (actualType == Type.INTEGER && expectedType == Type.LONG) {
                actualType = Type.LONG;
            }
+            if (actualType == Type.DATE_NANOS && expectedType == Type.DATETIME) {


I feel like if a storage returns DATE_NANOS, we should assert that, not update the type. Otherwise this still has the potential to mask bugs.

DATE_NANOS cannot represent pre-epoch dates (assert nanos >= 0 in DateUtils.compareNanosToMillis), which crashes on ORC files with historical timestamps. Revert to DATETIME (millis precision) until DATE_NANOS range is extended.

bpintea

LBetterTM.
The PR description should be updated too -- "silently dropping up to 999,999 nanoseconds per timestamp" :))
Also, it'd be good, though not a showstopper to add the CSV spec shared, that'll protect in the future.

Adds filterByPreEpochBirthDate to the shared csv-spec that filters on a 1953 birth date, protecting against re-introducing DATE_NANOS without fixing its range.

costin · 2026-03-30T20:31:00Z

Updated the PR description and added the filterByPreEpochBirthDate test to the shared csv-spec — filters on a 1953 birth date so any future DATE_NANOS reintroduction without fixing the range limitation will fail.

github-actions · 2026-03-30T20:31:36Z

🔍 Preview links for changed docs

⏳ Building and deploying preview... View progress

This comment will be updated with preview links when the build is complete.

github-actions · 2026-03-30T20:33:04Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

testPreEpochTimestamp: verifies 1953 date reads as negative millis, protecting the DATETIME path against DATE_NANOS regression at the reader level. testReadListTimestampColumn: exercises createListDatetimeBlock which had zero test coverage; includes a pre-epoch element.

Fix three ORC type mapping issues in the format reader: - DECIMAL columns (all precisions) crashed with QlIllegalArgumentException because DecimalColumnVector was not handled in the double block path. Add support for both DecimalColumnVector and Decimal64ColumnVector. - TIMESTAMP/TIMESTAMP_INSTANT remain mapped to DATETIME (millisecond precision). Mapping to DATE_NANOS was investigated but DATE_NANOS cannot represent pre-epoch dates (assert nanos >= 0 in DateUtils), which crashes on ORC files with historical timestamps. Nanosecond precision is deferred until the DATE_NANOS range limitation is addressed. - BINARY columns now map to UNSUPPORTED instead of KEYWORD to avoid exposing raw bytes as garbled text. Also adds a shared csv-spec regression test that filters on a pre-epoch birth date, protecting against re-introducing the DATE_NANOS mapping without fixing the range limitation. Developed with AI-assisted tooling

…rics * upstream/main: (428 commits) ESQL: DS: Add inference/RERANK tests (elastic#145229) Unmute MMR logical plan test (elastic#145311) Do not attempt marking store as corrupted if the check is rejected due to shutdown (elastic#145209) feat(tsdb): add pipeline runtime and rename stage interfaces (elastic#145175) Fix UnresolvedException on PromQL by(step) grouping (elastic#145307) ES|QL: Optimize MMR by reducing cache size and lookup (elastic#145014) Prometheus labels/series APIs: support multiple match[] selectors (elastic#145298) Move ClientScrollablePaginatedHitSource into Reindex Module (elastic#144100) mute test class for elastic#145277 CPS mode for ViewResolver (elastic#145219) [ESQL] Disables GroupedTopNBenchmark temporarily (elastic#145124) Make exponential_histogram the default histogram type for HTTP OTLP endpoint (elastic#145065) More tests requiring an explicit confidence interval (elastic#145232) ES|QL: Adding `USER_AGENT` command (elastic#144384) ESQL: enable Generative IT after more fixes (elastic#145112) Rework FieldMapper parameter tests to not use merge builders (elastic#145213) [ESQL] Fix ORC type support gaps (elastic#145074) [Test] Unmute FollowingEngineTests.testProcessOnceOnPrimary (elastic#145192) Add PrometheusSeriesRestAction for /_prometheus/api/v1/series endpoint (elastic#144494) Prometheus labels API: add rest action (elastic#144952) ...

Fix three ORC type mapping issues in the format reader: - DECIMAL columns (all precisions) crashed with QlIllegalArgumentException because DecimalColumnVector was not handled in the double block path. Add support for both DecimalColumnVector and Decimal64ColumnVector. - TIMESTAMP/TIMESTAMP_INSTANT remain mapped to DATETIME (millisecond precision). Mapping to DATE_NANOS was investigated but DATE_NANOS cannot represent pre-epoch dates (assert nanos >= 0 in DateUtils), which crashes on ORC files with historical timestamps. Nanosecond precision is deferred until the DATE_NANOS range limitation is addressed. - BINARY columns now map to UNSUPPORTED instead of KEYWORD to avoid exposing raw bytes as garbled text. Also adds a shared csv-spec regression test that filters on a pre-epoch birth date, protecting against re-introducing the DATE_NANOS mapping without fixing the range limitation. Developed with AI-assisted tooling

costin added >enhancement :Analytics/ES|QL AKA ESQL v9.4.0 ES|QL|DS ES|QL datasources labels Mar 27, 2026

costin requested a review from bpintea March 27, 2026 09:29

costin enabled auto-merge (squash) March 27, 2026 09:29

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 27, 2026

Update docs/changelog/145074.yaml

846c0fc

costin added 2 commits March 27, 2026 12:47

Merge branch 'main' into esql/orc-decimal-support

3658411

bpintea disabled auto-merge March 27, 2026 14:32

bpintea reviewed Mar 27, 2026

View reviewed changes

Address review feedback

1d8568f

- Add explanatory comment for DATE_NANOS/DATETIME coercion in CsvAssert, clarifying the rationale. - Replace silent else catch-alls with QlIllegalArgumentException in createListDateNanosBlock and createListDatetimeBlock, consistent with createDoubleBlock.

costin enabled auto-merge (squash) March 27, 2026 15:53

bpintea reviewed Mar 30, 2026

View reviewed changes

Revert TIMESTAMP to DATETIME, remove DATE_NANOS path

c188932

DATE_NANOS cannot represent pre-epoch dates (assert nanos >= 0 in DateUtils.compareNanosToMillis), which crashes on ORC files with historical timestamps. Revert to DATETIME (millis precision) until DATE_NANOS range is extended.

costin force-pushed the esql/orc-decimal-support branch from c3cb593 to c188932 Compare March 30, 2026 13:56

bpintea approved these changes Mar 30, 2026

View reviewed changes

Add pre-epoch date filter regression test

daee10e

Adds filterByPreEpochBirthDate to the shared csv-spec that filters on a 1953 birth date, protecting against re-introducing DATE_NANOS without fixing its range.

costin disabled auto-merge March 31, 2026 08:32

costin merged commit b41b08a into elastic:main Mar 31, 2026
34 of 35 checks passed

costin deleted the esql/orc-decimal-support branch March 31, 2026 08:32

Conversation

costin commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 27, 2026

Uh oh!

elasticsearchmachine commented Mar 27, 2026

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

costin commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Mar 30, 2026

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

costin commented Mar 27, 2026 •

edited

Loading

github-actions bot commented Mar 30, 2026 •

edited

Loading