Skip to content

Ensure the order of converters in ValuesFromManyReader#139019

Merged
craigtaverner merged 1 commit intoelastic:mainfrom
dnhatn:fix-many-readers
Dec 4, 2025
Merged

Ensure the order of converters in ValuesFromManyReader#139019
craigtaverner merged 1 commit intoelastic:mainfrom
dnhatn:fix-many-readers

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Dec 3, 2025

When loading from multiple shards, we previously stored converters and block builders in a HashMap. At build time, we traverse these to apply conversions to rows. However, because HashMap does not guarantee iteration order, the rows and converters could become misaligned, leading to incorrect output where the wrong converter is applied to the wrong shard.

This change replaces HashMap with LinkedHashMap to preserve insertion order.

This bug typically manifests when using more than 16 target shards (exceeding the default HashMap capacity) and became visible when the default number of shards in serverless was increased from 3 to 6.

Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.

I labelled this for an unreleased bug in 9.3.0 (see #132757)

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Dec 4, 2025
@dnhatn dnhatn marked this pull request as ready for review December 4, 2025 00:24
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Dec 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

*/
fieldTypeBuilders[f] = operator.fields[f].info.type().newBlockBuilder(docs.getPositionCount(), operator.blockFactory);
buildersAndLoaders.add(new HashMap<>());
buildersAndLoaders.add(new LinkedHashMap<>()); // use LinkedHashMap to preserve insertion order
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.

Comment on lines +314 to +316
String q1 = q0 + " | SORT _index";
String q2 = q0 + " | SORT single_type";
String q3 = q0 + " | SORT single_type, _index";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we need different sortings? Any of these would reproduce the problem right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need one query with SORT for this issue, but I added more queries to increase coverage.

Copy link
Contributor

@ncordon ncordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I've also tried the test without the fix and does indeed reproduce the failure


public void testMultiTypes() {
String q0 = """
FROM test-* METADATA _index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do we need several indices to be able to reproduce?

Shouldn't this be reproducible with just one index that is sharded?

The problem arises in buildBlocks from ValuesFromManyReader right? How is that code dependant on having several indices.

private void buildBlocks() {
    for (int f = 0; f < target.length; f++) {
        // Here entrySet might not be ordered
        for (var entry : buildersAndLoaders.get(f).entrySet()) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The converter is per index - all shards from the same index should have the same data type and, therefore, the same converter. To reproduce this bug, we need multiple indices with union types, more than 16 target shards, and TopN.

Copy link
Contributor

@cimequinox cimequinox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the change to the code is minimal.
I've manually tested it in my local environment and it appears to fix the problem.
Excellent work. 👍

Copy link
Contributor

@craigtaverner craigtaverner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@craigtaverner craigtaverner merged commit 294eb88 into elastic:main Dec 4, 2025
35 checks passed
@craigtaverner
Copy link
Contributor

Once this commit is promoted to serverless, we should unmute all the failing tests

@craigtaverner
Copy link
Contributor

I'm assigning myself to all the issues on elasticsearch-serverless, so I remember to unmute those tests once this fix hits the serverless repo.

@craigtaverner
Copy link
Contributor

Ah, I see Nhat already created a PR to unmute. I'll track that one.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 4, 2025

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue serverless-linked Added by automation, don't add manually Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants