ES|QL: Add MV_UNION Function#139664
Conversation
|
Hi @mridula-s109, I've created a changelog YAML for you. |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
ioanatia
left a comment
There was a problem hiding this comment.
this looks great! I had one question for the behaviour we want when dealing with nulls.
One aspect we can consider as a follow up is to reduce the duplication with mv_intersection.
We could have a base MvSetOperationFunction base class or something similar that both MvUnion and MvIntersect inherit from.
This would be also helpful if we want to implement another set operation like set difference: mv_difference (we would have to think of the right name).
But I think for now what we have is good enough and we don't need to go through this refactor.
@markjhoy might also want to take a look since he recently did the mv_union function
...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java
Outdated
Show resolved
Hide resolved
I think it's a good idea to always try and de-duplicate code... with the generators in ESQL, and the <T> void processFieldSets(
Block.Builder builder,
int position,
Block field1,
Block field2,
BiFunction<Integer, Block, T> getValueFunction,
BiFunction<Set<T>, Set<T>, Set<T>> combinationFunction,
Consumer<T> addValueFunction
) {
int firstValueCount = field1.getValueCount(position);
int secondValueCount = field2.getValueCount(position);
// If either field has no values (is null), return null
// this behaviour would change from union to intersection, etc. so we could
// just remove this short-circuit block for the "generic" version
if (firstValueCount == 0 || secondValueCount == 0) { // <- this would change behaviour between union and intersection
builder.appendNull();
return;
}
int firstValueIndex = field1.getFirstValueIndex(position);
int secondValueIndex = field2.getFirstValueIndex(position);
// Use LinkedHashSet to maintain insertion order
Set<T> firstSet = new LinkedHashSet<>();
// Add all values from first field
for (int i = 0; i < firstValueCount; i++) {
firstSet.add(getValueFunction.apply(firstValueIndex + i, field1));
}
Set<T> secondSet = new LinkedHashSet<>();
// Add all values from second field (duplicates automatically ignored by Set)
for (int i = 0; i < secondValueCount; i++) {
secondSet.add(getValueFunction.apply(secondValueIndex + i, field2));
}
Set<T> combinedSet = combinationFunction.apply(firstSet, secondSet);
if (combinedSet.isEmpty()) {
builder.appendNull();
return;
}
// Build result
builder.beginPositionEntry();
for (T value : values) {
addValueFunction.accept(value);
}
builder.endPositionEntry();
}Where the This would make it easier to extend to other set operations in the future perhaps (e.g. MV_DIFFERENCE, MV_COMPLIMENT, etc.) UPDATE:(edit - just realized... the behaviour for when both fields are |
x-pack/plugin/esql/qa/testFixtures/src/main/resources/mv_union.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/qa/testFixtures/src/main/resources/mv_union.csv-spec
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java
Outdated
Show resolved
Hide resolved
removing the blocker based on Craig's input
…expression/function/scalar/multivalue/MvUnion.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>
…expression/function/scalar/multivalue/MvUnion.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>
@ioanatia, @markjhoy Great suggestion, i agree the code deduplication would be valuable for maintainability and future set operations like MV_DIFFERENCE. Having said that, for this PR i will limit the scope and create a follow-up issue to refactor both MV_UNION and MV_INTERSECTION to share a common helper, which would also make adding new set operations easier. @craigtaverner Also i have updated the MV_UNION function to treat null as an empty set if only one of the input value is null. |
related: #139298
Description: