Skip to content

Comments

[ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper#139982

Merged
mridula-s109 merged 6 commits intoelastic:mainfrom
mridula-s109:mridula-s109/refactor-mv-set-operations
Jan 7, 2026
Merged

[ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper#139982
mridula-s109 merged 6 commits intoelastic:mainfrom
mridula-s109:mridula-s109/refactor-mv-set-operations

Conversation

@mridula-s109
Copy link
Contributor

Follow-up to #139664 (MV_UNION implementation).

This PR extracts common set operation logic into a shared MvSetOperationHelper utility class to reduce code duplication between MV_UNION and MV_INTERSECTION.

Changes

  • Created MvSetOperationHelper with union() and intersection() static methods
  • Refactored MvUnion.processUnionSet() andMvIntersection.processIntersectionSet() to use the helper

@mridula-s109 mridula-s109 self-assigned this Dec 24, 2025
@mridula-s109 mridula-s109 added >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.4.0 labels Dec 24, 2025
@mridula-s109
Copy link
Contributor Author

@markjhoy and @ioanatia I have verified by running all the tests locally as well. I haven't added unit tests for the helper explicitly as we have sufficient tests for union and intersection functionality. Please do have a look and let me know!

@mridula-s109 mridula-s109 marked this pull request as ready for review December 24, 2025 00:08
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@markjhoy
Copy link
Contributor

markjhoy commented Dec 29, 2025

Thanks for this @mridula-s109 -- this looks clean. The only thing I think we can optimize here is to not have to initialize additional sets in the helper class... I'm wondering if something like creating a Set<E> class that extends the LinkedHashSet, and assigning the first set of values to it would be better.... I'm thinking something like:

    public class OperationalSet<E> extends LinkedHashSet<E> {
        public Set<E> union(Set<E> set) {
            this.addAll(set);
            return this;
        }
        
        public Set<E> intersect(Set<E> set) {
            this.retainAll(set);
            return this;
        }
    }

Set<T> firstSet = new OperationalSet<>();
for (int i = 0; i < firstValueCount; i++) {
    values.add(getValueFunction.apply(firstValueIndex + i, field1));
}

Set<T> secondSet = new LinikedHashSet<>();
int secondValueIndex = field2.getFirstValueIndex(position);
for (int i = 0; i < secondValueCount; i++) {
    secondSet.add(getValueFunction.apply(secondValueIndex + i, field2));
}

Set<T> result = firstSet.union(secondSet);
(or - Set<T> result = firstSet.intersection(secondSet); )

@mridula-s109 mridula-s109 force-pushed the mridula-s109/refactor-mv-set-operations branch from 1ca3d21 to 7e83ce5 Compare January 6, 2026 18:20
@mridula-s109
Copy link
Contributor Author

public class OperationalSet extends LinkedHashSet {
public Set union(Set set) {
this.addAll(set);
return this;
}

    public Set<E> intersect(Set<E> set) {
        this.retainAll(set);
        return this;
    }
}

Set firstSet = new OperationalSet<>();
for (int i = 0; i < firstValueCount; i++) {
values.add(getValueFunction.apply(firstValueIndex + i, field1));
}

Set secondSet = new LinikedHashSet<>();
int secondValueIndex = field2.getFirstValueIndex(position);
for (int i = 0; i < secondValueCount; i++) {
secondSet.add(getValueFunction.apply(secondValueIndex + i, field2));
}

Set result = firstSet.union(secondSet);
(or - Set result = firstSet.intersection(secondSet); )

thanks @markjhoy for the suggestion. i have addressed the code optimisation.

Copy link
Contributor

@markjhoy markjhoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. In the words of Marie Kondo - "This sparks joy" 😂

@mridula-s109 mridula-s109 enabled auto-merge (squash) January 7, 2026 09:31
@mridula-s109 mridula-s109 merged commit a1f681e into elastic:main Jan 7, 2026
35 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 7, 2026
* upstream/main: (191 commits)
  Overall Decision for Deciders prioritizes THROTTLE (elastic#140237)
  Apply group by all logic not only to top-level aggregates (elastic#140248)
  [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982)
  Avoid reading entire bloom filter file on reader open (elastic#139374)
  Mark bloom filter files for random access (elastic#139375)
  Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034)
  Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230)
  Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228)
  Allow a slight difference in rescored docs (elastic#139931)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  Start exchange sink fetchers concurrently (elastic#140196)
  Allow allocation to replacement target node on vacate completion (elastic#140150)
  Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925)
  DeterministicQueue refactor and enhancement (elastic#140151)
  Always error out if CCS expression shows up when CCS is not supported (elastic#139009)
  Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200)
  Guard promql capabilities in AnalyzerTests (elastic#140232)
  [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978)
  Cleaning up exitable vector value impls (elastic#140190)
  [Inference API] Fix auth exception listener not called bug (elastic#139966)
  ...
@mridula-s109 mridula-s109 removed the request for review from ioanatia January 7, 2026 18:02
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
…tion helper (elastic#139982)

* ES|QL: Refactor MV set operations to use shared helper

* cleaned up the PR

* updated code to use operational set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants