ES|QL: Add MV_UNION Function by mridula-s109 · Pull Request #139664 · elastic/elasticsearch

mridula-s109 · 2025-12-17T00:40:57Z

related: #139298
Description:

  Adds MV_UNION function to ES|QL.
  
  Returns all unique values from both input multi-valued fields (set union).
  
  Example:
  Given set A = [1, 2, 3] and set B = [2, 3, 4]
  MV_UNION(A, B) returns [1, 2, 3, 4]

elasticsearchmachine · 2025-12-17T00:41:56Z

Hi @mridula-s109, I've created a changelog YAML for you.

tests

github-actions · 2025-12-17T19:06:51Z

🔍 Preview links for changed docs

github-actions · 2025-12-17T19:06:53Z

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

Check out the cumulative docs guidelines
Reach out in the #docs Slack channel

elasticsearchmachine · 2025-12-18T09:45:05Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

ioanatia

this looks great! I had one question for the behaviour we want when dealing with nulls.
One aspect we can consider as a follow up is to reduce the duplication with mv_intersection.
We could have a base MvSetOperationFunction base class or something similar that both MvUnion and MvIntersect inherit from.
This would be also helpful if we want to implement another set operation like set difference: mv_difference (we would have to think of the right name).
But I think for now what we have is good enough and we don't need to go through this refactor.

@markjhoy might also want to take a look since he recently did the mv_union function

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java

markjhoy · 2025-12-18T15:39:17Z

this looks great! I had one question for the behaviour we want when dealing with nulls. One aspect we can consider as a follow up is to reduce the duplication with mv_intersection. We could have a base MvSetOperationFunction base class or something similar that both MvUnion and MvIntersect inherit from. This would be also helpful if we want to implement another set operation like set difference: mv_difference (we would have to think of the right name). But I think for now what we have is good enough and we don't need to go through this refactor.

@markjhoy might also want to take a look since he recently did the mv_union function

I think it's a good idea to always try and de-duplicate code... with the generators in ESQL, and the process methods being static it gets a bit tricky though... one thing we could do to possibly at least have some commonality, is for the processUnionSet method (in MV_UNION, or the processIntersectionSet in MV_INTERSECTION) -- a lot of that code is fairly common for getting the values and indices for these two, so maybe a helper function something like (this might be messy looking, but just as an idea):

<T> void processFieldSets(
        Block.Builder builder,
        int position,
        Block field1,
        Block field2,
        BiFunction<Integer, Block, T> getValueFunction,
        BiFunction<Set<T>, Set<T>, Set<T>> combinationFunction,
        Consumer<T> addValueFunction
    ) {
        int firstValueCount = field1.getValueCount(position);
        int secondValueCount = field2.getValueCount(position);

        // If either field has no values (is null), return null
        // this behaviour would change from union to intersection, etc. so we could
        // just remove this short-circuit block for the "generic" version
        if (firstValueCount == 0 || secondValueCount == 0) { // <- this would change behaviour between union and intersection
            builder.appendNull();
            return;
        }

        int firstValueIndex = field1.getFirstValueIndex(position);
        int secondValueIndex = field2.getFirstValueIndex(position);

        // Use LinkedHashSet to maintain insertion order
        Set<T> firstSet = new LinkedHashSet<>();

        // Add all values from first field
        for (int i = 0; i < firstValueCount; i++) {
            firstSet.add(getValueFunction.apply(firstValueIndex + i, field1));
        }

        Set<T> secondSet = new LinkedHashSet<>();
        // Add all values from second field (duplicates automatically ignored by Set)
        for (int i = 0; i < secondValueCount; i++) {
            secondSet.add(getValueFunction.apply(secondValueIndex + i, field2));
        }

        Set<T> combinedSet = combinationFunction.apply(firstSet, secondSet);

        if (combinedSet.isEmpty()) {
            builder.appendNull();
            return;
        }

        // Build result
        builder.beginPositionEntry();
        for (T value : values) {
            addValueFunction.accept(value);
        }
        builder.endPositionEntry();
}

Where the combinationFunction here would returned the union of the two sets (and in MV_INTERSECTION returns the intersection)...

This would make it easier to extend to other set operations in the future perhaps (e.g. MV_DIFFERENCE, MV_COMPLIMENT, etc.)

UPDATE:

(edit - just realized... the behaviour for when both fields are null I would think should return null - however, if one of the fields is null but the other has values, the union should return the values... )

x-pack/plugin/esql/qa/testFixtures/src/main/resources/mv_union.csv-spec

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java

removing the blocker based on Craig's input

…expression/function/scalar/multivalue/MvUnion.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>

mridula-s109 · 2025-12-19T18:51:59Z

this looks great! I had one question for the behaviour we want when dealing with nulls. One aspect we can consider as a follow up is to reduce the duplication with mv_intersection. We could have a base MvSetOperationFunction base class or something similar that both MvUnion and MvIntersect inherit from. This would be also helpful if we want to implement another set operation like set difference: mv_difference (we would have to think of the right name). But I think for now what we have is good enough and we don't need to go through this refactor.
@markjhoy might also want to take a look since he recently did the mv_union function

I think it's a good idea to always try and de-duplicate code... with the generators in ESQL, and the process methods being static it gets a bit tricky though... one thing we could do to possibly at least have some commonality, is for the processUnionSet method (in MV_UNION, or the processIntersectionSet in MV_INTERSECTION) -- a lot of that code is fairly common for getting the values and indices for these two, so maybe a helper function something like (this might be messy looking, but just as an idea):
<T> void processFieldSets(
        Block.Builder builder,
        int position,
        Block field1,
        Block field2,
        BiFunction<Integer, Block, T> getValueFunction,
        BiFunction<Set<T>, Set<T>, Set<T>> combinationFunction,
        Consumer<T> addValueFunction
    ) {
        int firstValueCount = field1.getValueCount(position);
        int secondValueCount = field2.getValueCount(position);

        // If either field has no values (is null), return null
        // this behaviour would change from union to intersection, etc. so we could
        // just remove this short-circuit block for the "generic" version
        if (firstValueCount == 0 || secondValueCount == 0) { // <- this would change behaviour between union and intersection
            builder.appendNull();
            return;
        }

        int firstValueIndex = field1.getFirstValueIndex(position);
        int secondValueIndex = field2.getFirstValueIndex(position);

        // Use LinkedHashSet to maintain insertion order
        Set<T> firstSet = new LinkedHashSet<>();

        // Add all values from first field
        for (int i = 0; i < firstValueCount; i++) {
            firstSet.add(getValueFunction.apply(firstValueIndex + i, field1));
        }

        Set<T> secondSet = new LinkedHashSet<>();
        // Add all values from second field (duplicates automatically ignored by Set)
        for (int i = 0; i < secondValueCount; i++) {
            secondSet.add(getValueFunction.apply(secondValueIndex + i, field2));
        }

        Set<T> combinedSet = combinationFunction.apply(firstSet, secondSet);

        if (combinedSet.isEmpty()) {
            builder.appendNull();
            return;
        }

        // Build result
        builder.beginPositionEntry();
        for (T value : values) {
            addValueFunction.accept(value);
        }
        builder.endPositionEntry();
}
Where the combinationFunction here would returned the union of the two sets (and in MV_INTERSECTION returns the intersection)...

This would make it easier to extend to other set operations in the future perhaps (e.g. MV_DIFFERENCE, MV_COMPLIMENT, etc.)

UPDATE:

(edit - just realized... the behaviour for when both fields are null I would think should return null - however, if one of the fields is null but the other has values, the union should return the values... )

@ioanatia, @markjhoy Great suggestion, i agree the code deduplication would be valuable for maintainability and future set operations like MV_DIFFERENCE. Having said that, for this PR i will limit the scope and create a follow-up issue to refactor both MV_UNION and MV_INTERSECTION to share a common helper, which would also make adding new set operations easier.

@craigtaverner Also i have updated the MV_UNION function to treat null as an empty set if only one of the input value is null.

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java

markjhoy

👍 LGTM

WIP MV_UNION function

bb1f129

mridula-s109 self-assigned this Dec 17, 2025

mridula-s109 added >enhancement :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Dec 17, 2025

elasticsearchmachine added the v9.3.0 label Dec 17, 2025

mridula-s109 added 4 commits December 17, 2025 00:41

Update docs/changelog/139664.yaml

6b8ea5b

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

a258c06

Add MV_UNION function for ES|QL

12dee78

tests

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

ace87f1

elasticsearchmachine and others added 6 commits December 17, 2025 19:13

[CI] Auto commit changes from spotless

e395855

import error

629fcb5

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

162fd10

updated csv tests

b6ff5e0

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

5783867

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

d23497f

elasticsearchmachine added v9.4.0 and removed v9.3.0 labels Dec 17, 2025

mridula-s109 requested review from a team and ioanatia December 18, 2025 09:41

mridula-s109 added Team:SearchOrg Meta label for the Search Org (Enterprise Search) Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed Team:SearchOrg Meta label for the Search Org (Enterprise Search) labels Dec 18, 2025

mridula-s109 requested a review from leemthompo December 18, 2025 09:43

mridula-s109 marked this pull request as ready for review December 18, 2025 09:44

elasticsearchmachine removed the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Dec 18, 2025

ioanatia approved these changes Dec 18, 2025

View reviewed changes

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java Outdated Show resolved Hide resolved

markjhoy reviewed Dec 18, 2025

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/mv_union.csv-spec Outdated Show resolved Hide resolved

markjhoy reviewed Dec 18, 2025

View reviewed changes

x-pack/plugin/esql/qa/testFixtures/src/main/resources/mv_union.csv-spec Outdated Show resolved Hide resolved

markjhoy previously requested changes Dec 18, 2025

View reviewed changes

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java Outdated Show resolved Hide resolved

markjhoy self-requested a review December 18, 2025 16:27

alex-spies mentioned this pull request Dec 19, 2025

ESQL: option for MV_CONTAINS to not consider null as an empty set #139435

Open

mridula-s109 and others added 5 commits December 19, 2025 15:42

Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/…

2b1ae1b

…expression/function/scalar/multivalue/MvUnion.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>

Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/…

23c94ac

…expression/function/scalar/multivalue/MvUnion.java Co-authored-by: Liam Thompson <leemthompo@gmail.com>

Made markdown changes

8b3c105

Update MV_UNION to treat null as empty set

725dc81

updated MV_UNION for null acceptance

18d6232

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

d767a62

mridula-s109 requested a review from ioanatia December 19, 2025 18:55

mridula-s109 and others added 4 commits December 19, 2025 18:55

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

18b1d38

[CI] Auto commit changes from spotless

87aeeab

Line break introduced

ec29f3e

[CI] Auto commit changes from spotless

8253dd0

markjhoy reviewed Dec 19, 2025

View reviewed changes

...rc/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/multivalue/MvUnion.java Show resolved Hide resolved

markjhoy approved these changes Dec 19, 2025

View reviewed changes

Merge branch 'main' into mridula-s109/add_MV_UNION_function_esql

1c553a7

mridula-s109 merged commit 90cfe52 into elastic:main Dec 22, 2025
35 checks passed

This was referenced Dec 22, 2025

Refactor MV_UNION and MV_INTERSECTION to share common base class #139916

Closed

[ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper #139982

Merged

This was referenced Dec 30, 2025

[CLEAN] Synthetic Benchmark PR #139664 - ES|QL: Add MV_UNION Function qodo-benchmark/elasticsearch#41

Open

[CORRUPTED] Synthetic Benchmark PR #139664 - ES|QL: Add MV_UNION Function qodo-benchmark/elasticsearch#42

Open

mridula-s109 mentioned this pull request Jan 14, 2026

ESQL: Functions! #98545

Open

ioanatia mentioned this pull request Feb 3, 2026

ES|QL: Add MV_DIFFERENCE function #141699

Open

Comments

Conversation

mridula-s109 commented Dec 17, 2025 • edited by ioanatia Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions bot commented Dec 17, 2025

ℹ️ Important: Docs version tagging

When to use applies_to tags:

What NOT to do:

🤔 Need help?

Uh oh!

elasticsearchmachine commented Dec 18, 2025

Uh oh!

ioanatia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markjhoy commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UPDATE:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mridula-s109 commented Dec 19, 2025

UPDATE:

Uh oh!

Uh oh!

markjhoy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mridula-s109 commented Dec 17, 2025 •

edited by ioanatia

Loading

github-actions bot commented Dec 17, 2025 •

edited

Loading

markjhoy commented Dec 18, 2025 •

edited

Loading