Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backport] [2.x] [Concurrent Segment Search]: Add support for query profiler with concurrent aggregation (#9248) #9835

Closed

Conversation

ticheng-aws
Copy link
Contributor

(cherry picked from commit a737446)

Description

At present, the query profiler is the total timing obtained by summing all segments. However, this does not accurately represent the detailed timing of the query, including queue time and gap time, during concurrent execution. This PR will address the problem.

Related Issues

Resolves #8330 #7354

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ticheng-aws ticheng-aws changed the title Add support for query profiler with concurrent aggregation (#9248) [Backport] [2.x] [Concurrent Segment Search]: Add support for query profiler with concurrent aggregation (#9248) Sep 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Compatibility status:

Checks if related components are compatible with change a2552db

Incompatible components

Incompatible components: [https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Compatibility status:

Checks if related components are compatible with change 42bd603

Incompatible components

Incompatible components: [https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/neural-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@reta reta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sohami @ticheng-aws holding the merge, we have NPE here #9815 and another NPE variation in this pull request, we cannot let it sneak into the release:

java.lang.NullPointerException: Cannot invoke "org.opensearch.action.search.SearchResponse.getFailedShards()" because "profileResponse" is null
	at __randomizedtesting.SeedInfo.seed([E68DEFD659967F87:B55838D408DE9DFB]:0)
	at org.opensearch.search.profile.query.QueryProfilerIT.testProfileMatchesRegular(QueryProfilerIT.java:195)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

@sohami
Copy link
Collaborator

sohami commented Sep 6, 2023

@sohami @ticheng-aws holding the merge, we have NPE here #9815 and another NPE variation in this pull request, we cannot let it sneak into the release:

java.lang.NullPointerException: Cannot invoke "org.opensearch.action.search.SearchResponse.getFailedShards()" because "profileResponse" is null
	at __randomizedtesting.SeedInfo.seed([E68DEFD659967F87:B55838D408DE9DFB]:0)
	at org.opensearch.search.profile.query.QueryProfilerIT.testProfileMatchesRegular(QueryProfilerIT.java:195)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

@reta We looked into this bug. The NPE is happening because of an issue observed in concurrent search path of profile where rewrite is happening during execution phase as well (this use case was unknown). This issue existed even before these new changes were done but it is showing up now since we enabled the QueryProfilerIT with concurrent search flag enabled as part of these changes. So the plan is to merge these new profile changes and disable the failing test for concurrent segment search for now. The fix will be worked on as part of the issue you created and we will enable the test with that fix. Given only some specific queries (with profile option enabled) in concurrent segment search path is impacted and is behind experimental flag, we should still be fine back porting this PR for experimental release in 2.10.0. Wdyt ?

@ticheng-aws ticheng-aws requested a review from reta September 6, 2023 22:20
…h-project#9248)

* Add support for query profiler with concurrent aggregation (opensearch-project#9248)

Signed-off-by: Ticheng Lin <[email protected]>

* Refactor and work on the PR comments

Signed-off-by: Ticheng Lin <[email protected]>

* Update collectorToLeaves mapping for children breakdowns post profile metric collection and before creating the results

Signed-off-by: Sorabh Hamirwasia <[email protected]>

* Refactor logic to compute the slice level breakdown stats and query level breakdown stats for concurrent search case

Signed-off-by: Sorabh Hamirwasia <[email protected]>

* Fix QueryProfilePhaseTests and QueryProfileTests, and parameterize QueryProfilerIT with concurrent search enabled

Signed-off-by: Ticheng Lin <[email protected]>

* Handle the case when there are no leaf context to compute the profile stats to return default stats
for all breakdown type along with min/max/avg values. Replace queryStart and queryEnd time with queryNodeTime

Signed-off-by: Sorabh Hamirwasia <[email protected]>

* Add UTs for ConcurrentQueryProfileBreakdown

Signed-off-by: Sorabh Hamirwasia <[email protected]>

* Add concurrent search stats test into the QueryProfilerIT

Signed-off-by: Ticheng Lin <[email protected]>

* Address review comments

Signed-off-by: Sorabh Hamirwasia <[email protected]>

---------

Signed-off-by: Ticheng Lin <[email protected]>
Signed-off-by: Sorabh Hamirwasia <[email protected]>
Co-authored-by: Sorabh Hamirwasia <[email protected]>
Signed-off-by: Ticheng Lin <[email protected]>
@sohami
Copy link
Collaborator

sohami commented Sep 6, 2023

@sohami @ticheng-aws holding the merge, we have NPE here #9815 and another NPE variation in this pull request, we cannot let it sneak into the release:

java.lang.NullPointerException: Cannot invoke "org.opensearch.action.search.SearchResponse.getFailedShards()" because "profileResponse" is null
	at __randomizedtesting.SeedInfo.seed([E68DEFD659967F87:B55838D408DE9DFB]:0)
	at org.opensearch.search.profile.query.QueryProfilerIT.testProfileMatchesRegular(QueryProfilerIT.java:195)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

@reta We looked into this bug. The NPE is happening because of an issue observed in concurrent search path of profile where rewrite is happening during execution phase as well (this use case was unknown). This issue existed even before these new changes were done but it is showing up now since we enabled the QueryProfilerIT with concurrent search flag enabled as part of these changes. So the plan is to merge these new profile changes and disable the failing test for concurrent segment search for now. The fix will be worked on as part of the issue you created and we will enable the test with that fix. Given only some specific queries (with profile option enabled) in concurrent segment search path is impacted and is behind experimental flag, we should still be fine back porting this PR for experimental release in 2.10.0. Wdyt ?

Just realized this is a separate issue and is happening in the new code path. We will not back-port the new changes to 2.10 in that case.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Compatibility status:

Checks if related components are compatible with change 5f1c7c4

Incompatible components

Incompatible components: [https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/neural-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Oct 7, 2023
@ticheng-aws
Copy link
Contributor Author

ticheng-aws commented Jan 6, 2024

Closed this PR as it's backported from #10898.

@ticheng-aws ticheng-aws closed this Jan 6, 2024
@ticheng-aws ticheng-aws added the enhancement Enhancement or improvement to existing feature or request label Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants