Skip to content

Conversation

@karenyrx
Copy link
Contributor

Description

Implements proposed changes #19277 to optimize the GRPC server for higher throughput.

Example

Before (Blocking Event Loops)

16-core system with 100 concurrent gRPC requests:
- Event loop threads: 16 (shared for connection + I/O + processing)
- Bottleneck: Protobuf serialization blocks I/O threads
- Result: Reduced throughput, connection timeouts

After (Dedicated Thread Pools)

16-core system with 100 concurrent gRPC requests:
- Boss threads: 1 (connection acceptance)
- Worker threads: 16 (network I/O only)
- Executor threads: 32 (request processing + serialization)
- Result: Non-blocking I/O, improved throughput

Related Issues

Resolves #19277

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Karen Xu <[email protected]>
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search:Performance labels Sep 12, 2025
@karenyrx karenyrx changed the title Optimize GRPC server [GRPC] Optimize gRPC transport thread management for improved throughput Sep 12, 2025
Signed-off-by: Karen Xu <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for a3053c9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 83ca58b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Karen Xu <[email protected]>
@github-actions
Copy link
Contributor

❌ Gradle check result for b0742bc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Karen Xu <[email protected]>
@karenyrx karenyrx marked this pull request as ready for review September 12, 2025 22:55
@karenyrx karenyrx requested a review from a team as a code owner September 12, 2025 22:55
@github-actions
Copy link
Contributor

❕ Gradle check result for aa9d758: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 85.29412% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.01%. Comparing base (8df81fd) to head (5048d9c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...arch/transport/grpc/Netty4GrpcServerTransport.java 89.65% 1 Missing and 2 partials ⚠️
...java/org/opensearch/transport/grpc/GrpcPlugin.java 50.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #19278      +/-   ##
============================================
+ Coverage     72.86%   73.01%   +0.15%     
- Complexity    69951    70089     +138     
============================================
  Files          5681     5681              
  Lines        321436   321457      +21     
  Branches      46484    46486       +2     
============================================
+ Hits         234204   234715     +511     
+ Misses        68268    67811     -457     
+ Partials      18964    18931      -33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: karenx <[email protected]>
@github-actions
Copy link
Contributor

❕ Gradle check result for e876aae: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense.

I don't 100% understand the implication of adding the additional executors, but I can see how splitting worker assignment, doing the work, and writing the gRPC response across different threadpools helps.

@github-actions
Copy link
Contributor

❕ Gradle check result for 19c1e09: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@karenyrx
Copy link
Contributor Author

I don't 100% understand the implication of adding the additional executors

There isn't a strong basis for setting to default 2 * cpu_cores as the default value, but I thought it would make sense for it to be greater than the number of workers (which is sent to the num cpu_cores), since it probably takes longer for gRPC to parse the request than it does to accept one (as worker threads do). Since it is configurable, I suppose users can set this to fit their query patterns more if needed.

@github-actions
Copy link
Contributor

❌ Gradle check result for c41ad7f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 68aa440: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 1077157: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❕ Gradle check result for 5048d9c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mch2 mch2 merged commit cef8d98 into opensearch-project:main Sep 29, 2025
33 checks passed
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Oct 15, 2025
…put (opensearch-project#19278)

* server optimization

Signed-off-by: Karen Xu <[email protected]>

* readme

Signed-off-by: Karen Xu <[email protected]>

* fix tests

Signed-off-by: Karen Xu <[email protected]>

* spotless

Signed-off-by: Karen Xu <[email protected]>

* code cov

Signed-off-by: karenx <[email protected]>

* use FixedExecutorBuilder

Signed-off-by: karenx <[email protected]>

* fix typo

Signed-off-by: karenx <[email protected]>

* add javadocs and remove duplicate tests

Signed-off-by: karenx <[email protected]>

---------

Signed-off-by: Karen Xu <[email protected]>
Signed-off-by: Karen X <[email protected]>
Signed-off-by: karenx <[email protected]>
Co-authored-by: karenx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Search:Performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] GRPC Server Thread Pool Optimizations

5 participants