Ensure the order of converters in ValuesFromManyReader#139019
Ensure the order of converters in ValuesFromManyReader#139019craigtaverner merged 1 commit intoelastic:mainfrom
Conversation
9bd135d to
a6656a7
Compare
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
| */ | ||
| fieldTypeBuilders[f] = operator.fields[f].info.type().newBlockBuilder(docs.getPositionCount(), operator.blockFactory); | ||
| buildersAndLoaders.add(new HashMap<>()); | ||
| buildersAndLoaders.add(new LinkedHashMap<>()); // use LinkedHashMap to preserve insertion order |
There was a problem hiding this comment.
Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.
| String q1 = q0 + " | SORT _index"; | ||
| String q2 = q0 + " | SORT single_type"; | ||
| String q3 = q0 + " | SORT single_type, _index"; |
There was a problem hiding this comment.
Any reason why we need different sortings? Any of these would reproduce the problem right?
There was a problem hiding this comment.
We only need one query with SORT for this issue, but I added more queries to increase coverage.
ncordon
left a comment
There was a problem hiding this comment.
LGTM, I've also tried the test without the fix and does indeed reproduce the failure
|
|
||
| public void testMultiTypes() { | ||
| String q0 = """ | ||
| FROM test-* METADATA _index |
There was a problem hiding this comment.
Out of curiosity, why do we need several indices to be able to reproduce?
Shouldn't this be reproducible with just one index that is sharded?
The problem arises in buildBlocks from ValuesFromManyReader right? How is that code dependant on having several indices.
private void buildBlocks() {
for (int f = 0; f < target.length; f++) {
// Here entrySet might not be ordered
for (var entry : buildersAndLoaders.get(f).entrySet()) {There was a problem hiding this comment.
The converter is per index - all shards from the same index should have the same data type and, therefore, the same converter. To reproduce this bug, we need multiple indices with union types, more than 16 target shards, and TopN.
cimequinox
left a comment
There was a problem hiding this comment.
I like that the change to the code is minimal.
I've manually tested it in my local environment and it appears to fix the problem.
Excellent work. 👍
|
Once this commit is promoted to serverless, we should unmute all the failing tests |
|
I'm assigning myself to all the issues on elasticsearch-serverless, so I remember to unmute those tests once this fix hits the serverless repo. |
|
Ah, I see Nhat already created a PR to unmute. I'll track that one. |
|
Thanks everyone! |
When loading from multiple shards, we previously stored converters and block builders in a HashMap. At build time, we traverse these to apply conversions to rows. However, because HashMap does not guarantee iteration order, the rows and converters could become misaligned, leading to incorrect output where the wrong converter is applied to the wrong shard.
This change replaces HashMap with LinkedHashMap to preserve insertion order.
This bug typically manifests when using more than 16 target shards (exceeding the default HashMap capacity) and became visible when the default number of shards in serverless was increased from 3 to 6.
Ideally, we should replace the Map with a List, as we read shards in monotonically increasing order. I will follow up to remove this map soon.
I labelled this for an unreleased bug in 9.3.0 (see #132757)