-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Streaming Aggregation #18874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rishabhmaurya
merged 81 commits into
opensearch-project:main
from
bowenlan-amzn:stream-agg
Aug 6, 2025
+4,164
−107
Merged
Streaming Aggregation #18874
Changes from 76 commits
Commits
Show all changes
81 commits
Select commit
Hold shift + click to select a range
f3d35d1
vectorized version of StreamInput and StreamOutput
rishabhmaurya 80aa54c
Fix for the fetch phase optimization
rishabhmaurya 0c17227
Fix issues at flight transport layer; Add middleware for header manag…
rishabhmaurya eff27e9
Fix race condition with header in flight transport
rishabhmaurya f66c735
Refactor; gradle check fixes
rishabhmaurya 4211828
Add stats API
rishabhmaurya 764b8ab
Stats API refactor; Cancellation of stream through StreamTransportRes…
rishabhmaurya 09b994d
Added base test class for stream transport and tests for FlightClient…
rishabhmaurya 1c500b2
Fix tests due to null stream transport passed to StubbableTransport
rishabhmaurya 3258924
Fix the failing tests due to connection profile missing STREAM type
rishabhmaurya 46e6992
cancellation and timeout fixes; fixes for resource cleanup; more test…
rishabhmaurya 14c3646
Increase latch await time for early cancellation test to fix flakiness
rishabhmaurya 74b8a49
improve javadocs; code refactor
rishabhmaurya 97c76aa
fix issues in flight client channel; added docs on usage; standardize…
rishabhmaurya 04d1437
pass along request Id from OutboundHandler to TcpChannel; refactor Fl…
rishabhmaurya 138d35f
code coverage
rishabhmaurya 263ec94
API changes for stream transport
rishabhmaurya 8a40862
update docs
rishabhmaurya c18ba77
Standardize error handling
rishabhmaurya 815c09a
stream transport metrics and integration
rishabhmaurya bccccb3
unit tests for metrics
rishabhmaurya d1738dd
Fixes related to security and FGAC
rishabhmaurya 2bd02df
Chaos IT and fixes on resource leaks like reader context cleanup afte…
rishabhmaurya ac5512a
register stream default timeout setting
rishabhmaurya f2acdc9
test stability and latch timeout settings
rishabhmaurya 558ddcf
pr comment: nitpick
rishabhmaurya 18db622
aggregation ser/de changes not required anymore
rishabhmaurya ff125ff
Add changelog
rishabhmaurya 5a7b90a
Allow flight server to bind to multiple addresses
rishabhmaurya 04dbe86
example plugin to demonstrate defining stream based transport action
rishabhmaurya fa0cf52
support for slow logs, remove unnecessary thread switch to flight client
rishabhmaurya 3f6ed28
Make FlightServerChannel threadsafe
rishabhmaurya 906f94f
Allocator related tuning
rishabhmaurya bd5097f
Attempt to fix flaky metric test
rishabhmaurya 9e79215
Improve test coverage
rishabhmaurya 642c34f
fix documentation
rishabhmaurya 73a33af
Add @ExperimentalAPI annotation
rishabhmaurya b816830
Share TaskManager and remoteClientService between stream and regular …
rishabhmaurya 8c4c34a
fix tests
rishabhmaurya 02ad376
address pr comment
rishabhmaurya ecda165
fix test
rishabhmaurya 666a503
Update documentation
rishabhmaurya 275ad4d
Fix synchronization with multiple batches written concurrently at server
rishabhmaurya 9b1414e
Merge branch 'main' into search-stream-transport
rishabhmaurya a5c559d
Add changelog
rishabhmaurya 072cba9
Comment out some tests
bowenlan-amzn 2c0ad39
Revert "Comment out some tests"
bowenlan-amzn b2badbe
Streaming Aggregation
bowenlan-amzn 9e7ff13
Add mock stream transport for testing
bowenlan-amzn 0119116
innerOnResponse delegate to innerOnCompleteResponse for compatibility
bowenlan-amzn 9702dd9
Refactor the streaming interface for streaming search
bowenlan-amzn 0214603
address comments
bowenlan-amzn e5d7a54
better feature flag
bowenlan-amzn eeeb978
Revert stream flag from search source builder because we don't need i…
bowenlan-amzn 8c4d24a
Update log level to debug
bowenlan-amzn 1bba032
remove size=0
bowenlan-amzn 520c938
revert a small change
bowenlan-amzn 07556bf
Separating out stream from regular
harshavamsi 3495b80
Fix aggregator and split sendBatch
harshavamsi 6817dfd
refactor and fix some bugs
bowenlan-amzn b85f73b
buildAggBatch return list of internal aggregations
bowenlan-amzn c6081b1
batch reduce size for stream search
bowenlan-amzn 9da61bd
Remove stream execution hint
bowenlan-amzn 3a661bf
Clean up InternalTerms
bowenlan-amzn fc2ccea
Clean up
bowenlan-amzn 450808b
Refactor duplication in search service
bowenlan-amzn 82afcd6
Merge branch 'main' into stream-agg
bowenlan-amzn 68626b6
Update change log
bowenlan-amzn a759dc5
clean up
bowenlan-amzn 66ddce7
Add tests for StreamingStringTermsAggregator and SendBatch
harshavamsi 43bce78
Clean up and address comments
bowenlan-amzn 6052ff5
spotless
bowenlan-amzn 044385f
Merge branch 'main' into stream-agg
bowenlan-amzn 2846554
add comment
bowenlan-amzn 75bb6d2
Refactor StreamStringTermsAggregator
harshavamsi 0a738ee
Unblock prepareStreamSearch in NodeClient
bowenlan-amzn a991418
clean up
bowenlan-amzn 8663330
experimental api annotation
bowenlan-amzn 07467b1
change sendBatch to package private
bowenlan-amzn 3a35ee1
add type
bowenlan-amzn 49ebf96
Merge branch 'main' into stream-agg
bowenlan-amzn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
241 changes: 241 additions & 0 deletions
241
...c/src/internalClusterTest/java/org/opensearch/streaming/aggregation/SubAggregationIT.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,241 @@ | ||
| /* | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| * | ||
| * The OpenSearch Contributors require contributions made to | ||
| * this file be licensed under the Apache-2.0 license or a | ||
| * compatible open source license. | ||
| */ | ||
|
|
||
| package org.opensearch.streaming.aggregation; | ||
|
|
||
| import com.carrotsearch.randomizedtesting.annotations.ParametersFactory; | ||
|
|
||
| import org.opensearch.action.admin.indices.create.CreateIndexRequest; | ||
| import org.opensearch.action.admin.indices.create.CreateIndexResponse; | ||
| import org.opensearch.action.admin.indices.flush.FlushRequest; | ||
| import org.opensearch.action.admin.indices.refresh.RefreshRequest; | ||
| import org.opensearch.action.admin.indices.segments.IndicesSegmentResponse; | ||
| import org.opensearch.action.admin.indices.segments.IndicesSegmentsRequest; | ||
| import org.opensearch.action.bulk.BulkRequest; | ||
| import org.opensearch.action.bulk.BulkResponse; | ||
| import org.opensearch.action.index.IndexRequest; | ||
| import org.opensearch.action.search.SearchResponse; | ||
| import org.opensearch.arrow.flight.transport.FlightStreamPlugin; | ||
| import org.opensearch.common.action.ActionFuture; | ||
| import org.opensearch.common.settings.Settings; | ||
| import org.opensearch.common.unit.TimeValue; | ||
| import org.opensearch.common.xcontent.XContentType; | ||
| import org.opensearch.plugins.Plugin; | ||
| import org.opensearch.search.SearchHit; | ||
| import org.opensearch.search.aggregations.AggregationBuilders; | ||
| import org.opensearch.search.aggregations.bucket.terms.StringTerms; | ||
| import org.opensearch.search.aggregations.bucket.terms.TermsAggregationBuilder; | ||
| import org.opensearch.search.aggregations.metrics.Max; | ||
| import org.opensearch.test.OpenSearchIntegTestCase; | ||
| import org.opensearch.test.ParameterizedDynamicSettingsOpenSearchIntegTestCase; | ||
|
|
||
| import java.util.Arrays; | ||
| import java.util.Collection; | ||
| import java.util.Collections; | ||
| import java.util.Comparator; | ||
| import java.util.List; | ||
|
|
||
| import static org.opensearch.common.util.FeatureFlags.STREAM_TRANSPORT; | ||
| import static org.opensearch.search.SearchService.CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING; | ||
| import static org.opensearch.search.aggregations.AggregationBuilders.terms; | ||
|
|
||
| @OpenSearchIntegTestCase.ClusterScope(scope = OpenSearchIntegTestCase.Scope.SUITE, minNumDataNodes = 3, maxNumDataNodes = 3) | ||
| public class SubAggregationIT extends ParameterizedDynamicSettingsOpenSearchIntegTestCase { | ||
|
|
||
| public SubAggregationIT(Settings dynamicSettings) { | ||
| super(dynamicSettings); | ||
| } | ||
|
|
||
| @ParametersFactory | ||
| public static Collection<Object[]> parameters() { | ||
| return Arrays.asList( | ||
| new Object[] { Settings.builder().put(CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.getKey(), false).build() }, | ||
| new Object[] { Settings.builder().put(CLUSTER_CONCURRENT_SEGMENT_SEARCH_SETTING.getKey(), true).build() } | ||
| ); | ||
| } | ||
|
|
||
| static final int NUM_SHARDS = 3; | ||
| static final int MIN_SEGMENTS_PER_SHARD = 3; | ||
|
|
||
| @Override | ||
| protected Collection<Class<? extends Plugin>> nodePlugins() { | ||
| return Collections.singleton(FlightStreamPlugin.class); | ||
| } | ||
|
|
||
| @Override | ||
| public void setUp() throws Exception { | ||
| super.setUp(); | ||
| internalCluster().ensureAtLeastNumDataNodes(3); | ||
|
|
||
| Settings indexSettings = Settings.builder() | ||
| .put("index.number_of_shards", NUM_SHARDS) // Number of primary shards | ||
| .put("index.number_of_replicas", 0) // Number of replica shards | ||
| .put("index.search.concurrent_segment_search.mode", "none") | ||
| // Disable segment merging to keep individual segments | ||
| .put("index.merge.policy.max_merged_segment", "1kb") // Keep segments small | ||
| .put("index.merge.policy.segments_per_tier", "20") // Allow many segments per tier | ||
| .put("index.merge.scheduler.max_thread_count", "1") // Limit merge threads | ||
| .build(); | ||
|
|
||
| CreateIndexRequest createIndexRequest = new CreateIndexRequest("index").settings(indexSettings); | ||
| createIndexRequest.mapping( | ||
| "{\n" | ||
| + " \"properties\": {\n" | ||
| + " \"field1\": { \"type\": \"keyword\" },\n" | ||
| + " \"field2\": { \"type\": \"integer\" }\n" | ||
| + " }\n" | ||
| + "}", | ||
| XContentType.JSON | ||
| ); | ||
| CreateIndexResponse createIndexResponse = client().admin().indices().create(createIndexRequest).actionGet(); | ||
| assertTrue(createIndexResponse.isAcknowledged()); | ||
| client().admin().cluster().prepareHealth("index").setWaitForGreenStatus().setTimeout(TimeValue.timeValueSeconds(30)).get(); | ||
| BulkRequest bulkRequest = new BulkRequest(); | ||
|
|
||
| // We'll create 3 segments per shard by indexing docs into each segment and forcing a flush | ||
| // Segment 1 - we'll add docs with field2 values in 1-3 range | ||
| for (int i = 0; i < 10; i++) { | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value1", "field2", 1)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value2", "field2", 2)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value3", "field2", 3)); | ||
| } | ||
| BulkResponse bulkResponse = client().bulk(bulkRequest).actionGet(); | ||
| assertFalse(bulkResponse.hasFailures()); // Verify ingestion was successful | ||
| client().admin().indices().flush(new FlushRequest("index").force(true)).actionGet(); | ||
| client().admin().indices().refresh(new RefreshRequest("index")).actionGet(); | ||
|
|
||
| // Segment 2 - we'll add docs with field2 values in 11-13 range | ||
| bulkRequest = new BulkRequest(); | ||
| for (int i = 0; i < 10; i++) { | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value1", "field2", 11)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value2", "field2", 12)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value3", "field2", 13)); | ||
| } | ||
| bulkResponse = client().bulk(bulkRequest).actionGet(); | ||
| assertFalse(bulkResponse.hasFailures()); | ||
| client().admin().indices().flush(new FlushRequest("index").force(true)).actionGet(); | ||
| client().admin().indices().refresh(new RefreshRequest("index")).actionGet(); | ||
|
|
||
| // Segment 3 - we'll add docs with field2 values in 21-23 range | ||
| bulkRequest = new BulkRequest(); | ||
| for (int i = 0; i < 10; i++) { | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value1", "field2", 21)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value2", "field2", 22)); | ||
| bulkRequest.add(new IndexRequest("index").source(XContentType.JSON, "field1", "value3", "field2", 23)); | ||
| } | ||
| bulkResponse = client().bulk(bulkRequest).actionGet(); | ||
| assertFalse(bulkResponse.hasFailures()); | ||
| client().admin().indices().flush(new FlushRequest("index").force(true)).actionGet(); | ||
| client().admin().indices().refresh(new RefreshRequest("index")).actionGet(); | ||
|
|
||
| client().admin().indices().refresh(new RefreshRequest("index")).actionGet(); | ||
| ensureSearchable("index"); | ||
|
|
||
| // Verify that we have the expected number of shards and segments | ||
| IndicesSegmentResponse segmentResponse = client().admin().indices().segments(new IndicesSegmentsRequest("index")).actionGet(); | ||
| assertEquals(NUM_SHARDS, segmentResponse.getIndices().get("index").getShards().size()); | ||
|
|
||
| // Verify each shard has at least MIN_SEGMENTS_PER_SHARD segments | ||
| segmentResponse.getIndices().get("index").getShards().values().forEach(indexShardSegments -> { | ||
| assertTrue( | ||
| "Expected at least " | ||
| + MIN_SEGMENTS_PER_SHARD | ||
| + " segments but found " | ||
| + indexShardSegments.getShards()[0].getSegments().size(), | ||
| indexShardSegments.getShards()[0].getSegments().size() >= MIN_SEGMENTS_PER_SHARD | ||
| ); | ||
| }); | ||
| } | ||
|
|
||
| @LockFeatureFlag(STREAM_TRANSPORT) | ||
| public void testStreamingAggregation() throws Exception { | ||
| // This test validates streaming aggregation with 3 shards, each with at least 3 segments | ||
| TermsAggregationBuilder agg = terms("agg1").field("field1").subAggregation(AggregationBuilders.max("agg2").field("field2")); | ||
| ActionFuture<SearchResponse> future = client().prepareStreamSearch("index") | ||
| .addAggregation(agg) | ||
| .setSize(0) | ||
| .setRequestCache(false) | ||
| .execute(); | ||
| SearchResponse resp = future.actionGet(); | ||
| assertNotNull(resp); | ||
| assertEquals(NUM_SHARDS, resp.getTotalShards()); | ||
| assertEquals(90, resp.getHits().getTotalHits().value()); | ||
| StringTerms agg1 = (StringTerms) resp.getAggregations().asMap().get("agg1"); | ||
| List<StringTerms.Bucket> buckets = agg1.getBuckets(); | ||
| assertEquals(3, buckets.size()); | ||
|
|
||
| // Validate all buckets - each should have 30 documents | ||
| for (StringTerms.Bucket bucket : buckets) { | ||
| assertEquals(30, bucket.getDocCount()); | ||
| assertNotNull(bucket.getAggregations().get("agg2")); | ||
| } | ||
| buckets.sort(Comparator.comparing(StringTerms.Bucket::getKeyAsString)); | ||
|
|
||
| StringTerms.Bucket bucket1 = buckets.get(0); | ||
| assertEquals("value1", bucket1.getKeyAsString()); | ||
| assertEquals(30, bucket1.getDocCount()); | ||
| Max maxAgg1 = (Max) bucket1.getAggregations().get("agg2"); | ||
| assertEquals(21.0, maxAgg1.getValue(), 0.001); | ||
|
|
||
| StringTerms.Bucket bucket2 = buckets.get(1); | ||
| assertEquals("value2", bucket2.getKeyAsString()); | ||
| assertEquals(30, bucket2.getDocCount()); | ||
| Max maxAgg2 = (Max) bucket2.getAggregations().get("agg2"); | ||
| assertEquals(22.0, maxAgg2.getValue(), 0.001); | ||
|
|
||
| StringTerms.Bucket bucket3 = buckets.get(2); | ||
| assertEquals("value3", bucket3.getKeyAsString()); | ||
| assertEquals(30, bucket3.getDocCount()); | ||
| Max maxAgg3 = (Max) bucket3.getAggregations().get("agg2"); | ||
| assertEquals(23.0, maxAgg3.getValue(), 0.001); | ||
|
|
||
| for (SearchHit hit : resp.getHits().getHits()) { | ||
| assertNotNull(hit.getSourceAsString()); | ||
| } | ||
| } | ||
|
|
||
| @LockFeatureFlag(STREAM_TRANSPORT) | ||
| public void testStreamingAggregationTerm() throws Exception { | ||
| // This test validates streaming aggregation with 3 shards, each with at least 3 segments | ||
| TermsAggregationBuilder agg = terms("agg1").field("field1"); | ||
| ActionFuture<SearchResponse> future = client().prepareStreamSearch("index") | ||
| .addAggregation(agg) | ||
| .setSize(0) | ||
| .setRequestCache(false) | ||
| .execute(); | ||
| SearchResponse resp = future.actionGet(); | ||
| assertNotNull(resp); | ||
| assertEquals(NUM_SHARDS, resp.getTotalShards()); | ||
| assertEquals(90, resp.getHits().getTotalHits().value()); | ||
| StringTerms agg1 = (StringTerms) resp.getAggregations().asMap().get("agg1"); | ||
| List<StringTerms.Bucket> buckets = agg1.getBuckets(); | ||
| assertEquals(3, buckets.size()); | ||
|
|
||
| // Validate all buckets - each should have 30 documents | ||
| for (StringTerms.Bucket bucket : buckets) { | ||
| assertEquals(30, bucket.getDocCount()); | ||
| } | ||
| buckets.sort(Comparator.comparing(StringTerms.Bucket::getKeyAsString)); | ||
|
|
||
| StringTerms.Bucket bucket1 = buckets.get(0); | ||
| assertEquals("value1", bucket1.getKeyAsString()); | ||
| assertEquals(30, bucket1.getDocCount()); | ||
|
|
||
| StringTerms.Bucket bucket2 = buckets.get(1); | ||
| assertEquals("value2", bucket2.getKeyAsString()); | ||
| assertEquals(30, bucket2.getDocCount()); | ||
|
|
||
| StringTerms.Bucket bucket3 = buckets.get(2); | ||
| assertEquals("value3", bucket3.getKeyAsString()); | ||
| assertEquals(30, bucket3.getDocCount()); | ||
|
|
||
| for (SearchHit hit : resp.getHits().getHits()) { | ||
| assertNotNull(hit.getSourceAsString()); | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.