-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Pull-based Ingestion] Prevent shard initialization failures due to transient consumer errors #18877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pull-based Ingestion] Prevent shard initialization failures due to transient consumer errors #18877
Conversation
db4a5ad to
4a73eab
Compare
4a73eab to
f58323d
Compare
|
❌ Gradle check result for f58323d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
f58323d to
af82f9b
Compare
|
❌ Gradle check result for af82f9b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for af82f9b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for af82f9b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #18877 +/- ##
============================================
+ Coverage 72.77% 72.82% +0.04%
+ Complexity 68690 68673 -17
============================================
Files 5582 5582
Lines 315456 315508 +52
Branches 45778 45779 +1
============================================
+ Hits 229568 229760 +192
+ Misses 67290 67099 -191
- Partials 18598 18649 +51 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
server/src/main/java/org/opensearch/index/engine/IngestionEngine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java
Show resolved
Hide resolved
|
❌ Gradle check result for 04ace5d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Varun Bharadwaj <[email protected]>
Signed-off-by: Varun Bharadwaj <[email protected]>
04ace5d to
21c0daf
Compare
|
❌ Gradle check result for 21c0daf: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/indices/pollingingest/DefaultStreamPoller.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Varun Bharadwaj <[email protected]>
|
❌ Gradle check result for 032764e: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…ransient consumer errors (opensearch-project#18877) * Move consumer initialization to the poller to prevent engine failure Signed-off-by: Varun Bharadwaj <[email protected]> * Rename log messages and update exception Signed-off-by: Varun Bharadwaj <[email protected]> * update default poller to use private constructor Signed-off-by: Varun Bharadwaj <[email protected]> --------- Signed-off-by: Varun Bharadwaj <[email protected]> Signed-off-by: sunqijun.jun <[email protected]>
…ransient consumer errors (opensearch-project#18877) * Move consumer initialization to the poller to prevent engine failure Signed-off-by: Varun Bharadwaj <[email protected]> * Rename log messages and update exception Signed-off-by: Varun Bharadwaj <[email protected]> * update default poller to use private constructor Signed-off-by: Varun Bharadwaj <[email protected]> --------- Signed-off-by: Varun Bharadwaj <[email protected]>
…ransient consumer errors (opensearch-project#18877) * Move consumer initialization to the poller to prevent engine failure Signed-off-by: Varun Bharadwaj <[email protected]> * Rename log messages and update exception Signed-off-by: Varun Bharadwaj <[email protected]> * update default poller to use private constructor Signed-off-by: Varun Bharadwaj <[email protected]> --------- Signed-off-by: Varun Bharadwaj <[email protected]>
Description
Transient consumer errors (such as kafka connection issues) fail shard/engine initialization today, resulting in possibility of cascading failures, where all the primary and replica shards can go down. This PR moves the consumer initialization logic into the poller with retries in case of transient errors.
Related Issues
Follow up for #16929
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.