Ensure the order of converters in ValuesFromManyReader by dnhatn · Pull Request #139019 · elastic/elasticsearch

dnhatn · 2025-12-03T23:21:01Z

When loading from multiple shards, we previously stored converters and block builders in a HashMap. At build time, we traverse these to apply conversions to rows. However, because HashMap does not guarantee iteration order, the rows and converters could become misaligned, leading to incorrect output where the wrong converter is applied to the wrong shard.

This change replaces HashMap with LinkedHashMap to preserve insertion order.

This bug typically manifests when using more than 16 target shards (exceeding the default HashMap capacity) and became visible when the default number of shards in serverless was increased from 3 to 6.

Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.

I labelled this for an unreleased bug in 9.3.0 (see #132757)

elasticsearchmachine · 2025-12-04T00:24:33Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

dnhatn · 2025-12-04T00:25:00Z

...n/esql/compute/src/main/java/org/elasticsearch/compute/lucene/read/ValuesFromManyReader.java

                 */
                fieldTypeBuilders[f] = operator.fields[f].info.type().newBlockBuilder(docs.getPositionCount(), operator.blockFactory);
-                buildersAndLoaders.add(new HashMap<>());
+                buildersAndLoaders.add(new LinkedHashMap<>()); // use LinkedHashMap to preserve insertion order


Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.

ncordon · 2025-12-04T09:00:52Z

...ugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/ManyShardsIT.java

+        String q1 = q0 + " | SORT _index";
+        String q2 = q0 + " | SORT single_type";
+        String q3 = q0 + " | SORT single_type, _index";


Any reason why we need different sortings? Any of these would reproduce the problem right?

We only need one query with SORT for this issue, but I added more queries to increase coverage.

ncordon

LGTM, I've also tried the test without the fix and does indeed reproduce the failure

ncordon · 2025-12-04T09:58:17Z

...ugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/ManyShardsIT.java

+
+    public void testMultiTypes() {
+        String q0 = """
+            FROM test-* METADATA _index


Out of curiosity, why do we need several indices to be able to reproduce?

Shouldn't this be reproducible with just one index that is sharded?

The problem arises in buildBlocks from ValuesFromManyReader right? How is that code dependant on having several indices.

private void buildBlocks() { for (int f = 0; f < target.length; f++) { // Here entrySet might not be ordered for (var entry : buildersAndLoaders.get(f).entrySet()) {

The converter is per index - all shards from the same index should have the same data type and, therefore, the same converter. To reproduce this bug, we need multiple indices with union types, more than 16 target shards, and TopN.

cimequinox

I like that the change to the code is minimal.
I've manually tested it in my local environment and it appears to fix the problem.
Excellent work. 👍

craigtaverner

LGTM

craigtaverner · 2025-12-04T14:24:13Z

Once this commit is promoted to serverless, we should unmute all the failing tests

craigtaverner · 2025-12-04T15:10:30Z

I'm assigning myself to all the issues on elasticsearch-serverless, so I remember to unmute those tests once this fix hits the serverless repo.

craigtaverner · 2025-12-04T15:11:36Z

Ah, I see Nhat already created a PR to unmute. I'll track that one.

dnhatn · 2025-12-04T15:41:12Z

Thanks everyone!

elasticsearchmachine added the v9.3.0 label Dec 3, 2025

dnhatn added :Analytics/ES|QL AKA ESQL >non-issue labels Dec 3, 2025

Ensure the order of converters in ValuesFromManyReader

a6656a7

dnhatn force-pushed the fix-many-readers branch from 9bd135d to a6656a7 Compare December 4, 2025 00:15

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Dec 4, 2025

dnhatn requested review from GalLalouche, cimequinox and craigtaverner December 4, 2025 00:23

dnhatn marked this pull request as ready for review December 4, 2025 00:24

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 4, 2025

dnhatn commented Dec 4, 2025

View reviewed changes

ncordon reviewed Dec 4, 2025

View reviewed changes

ncordon approved these changes Dec 4, 2025

View reviewed changes

ncordon reviewed Dec 4, 2025

View reviewed changes

GalLalouche approved these changes Dec 4, 2025

View reviewed changes

cimequinox approved these changes Dec 4, 2025

View reviewed changes

craigtaverner approved these changes Dec 4, 2025

View reviewed changes

craigtaverner merged commit 294eb88 into elastic:main Dec 4, 2025
35 checks passed

Conversation

dnhatn commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 4, 2025

Uh oh!

dnhatn Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ncordon Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ncordon left a comment

Choose a reason for hiding this comment

Uh oh!

ncordon Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

cimequinox left a comment

Choose a reason for hiding this comment

Uh oh!

craigtaverner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

craigtaverner commented Dec 4, 2025

Uh oh!

craigtaverner commented Dec 4, 2025

Uh oh!

craigtaverner commented Dec 4, 2025

Uh oh!

dnhatn commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dnhatn commented Dec 3, 2025 •

edited

Loading