Skip to content

Comments

[Backport 2.x] Add a guardrail to limit maximum number of shard on the cluster#6190

Merged
dreamer-89 merged 1 commit into2.xfrom
backport/backport-6143-to-2.x
Feb 6, 2023
Merged

[Backport 2.x] Add a guardrail to limit maximum number of shard on the cluster#6190
dreamer-89 merged 1 commit into2.xfrom
backport/backport-6143-to-2.x

Conversation

@opensearch-trigger-bot
Copy link
Contributor

Backport e42b76f from #6143.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2023

Codecov Report

Merging #6190 (c516009) into 2.x (8aee277) will increase coverage by 0.02%.
The diff coverage is 87.09%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##                2.x    #6190      +/-   ##
============================================
+ Coverage     70.50%   70.53%   +0.02%     
- Complexity    59064    59091      +27     
============================================
  Files          4774     4774              
  Lines        282934   282961      +27     
  Branches      41215    41216       +1     
============================================
+ Hits         199489   199578      +89     
+ Misses        66759    66714      -45     
+ Partials      16686    16669      -17     
Impacted Files Coverage Δ
...rg/opensearch/common/settings/ClusterSettings.java 91.89% <ø> (ø)
...va/org/opensearch/indices/ShardLimitValidator.java 92.68% <87.09%> (-3.69%) ⬇️
...g/opensearch/index/analysis/CharFilterFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/opensearch/client/indices/DataStream.java 0.00% <0.00%> (-76.09%) ⬇️
.../opensearch/client/indices/CloseIndexResponse.java 17.50% <0.00%> (-60.00%) ⬇️
.../org/opensearch/client/indices/AnalyzeRequest.java 31.00% <0.00%> (-42.00%) ⬇️
...h/action/ingest/SimulateDocumentVerboseResult.java 60.71% <0.00%> (-39.29%) ⬇️
...java/org/opensearch/threadpool/ThreadPoolInfo.java 56.25% <0.00%> (-37.50%) ⬇️
...pensearch/action/ingest/DeletePipelineRequest.java 31.25% <0.00%> (-37.50%) ⬇️
...earch/client/indices/GetIndexTemplatesRequest.java 50.00% <0.00%> (-34.62%) ⬇️
... and 499 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2023

Gradle Check (Jenkins) Run Completed with:

@dreamer-89
Copy link
Member

dreamer-89 commented Feb 6, 2023

Gradle Check (Jenkins) Run Completed with:

RemoteStoreIT.testRemoteTranslogRestore test failure. @ashking94 @sachinpkale : Looks like a legitimate failure ?

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.remotestore.RemoteStoreIT.testRemoteTranslogRestore" -Dtests.seed=753FCEB97C57104F -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=sr-Latn-RS -Dtests.timezone=PLT -Druntime.java=17

org.opensearch.remotestore.RemoteStoreIT > testRemoteTranslogRestore FAILED
    java.lang.AssertionError: timed out waiting for yellow state
        at __randomizedtesting.SeedInfo.seed([753FCEB97C57104F:6E80D68FECF3808A]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.OpenSearchIntegTestCase.ensureColor(OpenSearchIntegTestCase.java:1007)
        at org.opensearch.test.OpenSearchIntegTestCase.ensureYellowAndNoInitializingShards(OpenSearchIntegTestCase.java:960)
        at org.opensearch.remotestore.RemoteStoreIT.verifyRestoredData(RemoteStoreIT.java:143)
        at org.opensearch.remotestore.RemoteStoreIT.testRemoteTranslogRestore(RemoteStoreIT.java:181)
...


1> org.opensearch.indices.recovery.RecoveryFailedException: [remote-store-test-idx-1][0]: Recovery failed on {node_t1}{KEF1_6AQQK-myaLjZKHXnQ}{qAHS3XseR-OJmnbR2pW8PQ}{127.0.0.1}{127.0.0.1:35027}{dimr}{shard_indexing_pressure_enabled=true}
  1> 	at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$32(IndexShard.java:3341) [main/:?]
  1> 	at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$7(StoreRecovery.java:429) [main/:?]
  1> 	at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [main/:?]
  1> 	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:345) [main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:126) [main/:?]
  1> 	at org.opensearch.index.shard.IndexShard.restoreFromRemoteStore(IndexShard.java:2448) [main/:?]
  1> 	at org.opensearch.index.shard.IndexShard.lambda$startRecovery$27(IndexShard.java:3245) [main/:?]
  1> 	at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) [main/:?]
  1> 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [main/:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  1> 	at java.lang.Thread.run(Thread.java:833) [?:?]
  1> Caused by: org.opensearch.index.shard.IndexShardRecoveryException: Exception while recovering from remote store
  1> 	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:474) ~[main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromRemoteStore$1(StoreRecovery.java:128) ~[main/:?]
  1> 	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) ~[main/:?]
  1> 	... 9 more
  1> Caused by: java.nio.file.NoSuchFileException: /var/jenkins/workspace/gradle-check/search/server/build/testrun/internalClusterTest/temp/org.opensearch.remotestore.RemoteStoreIT_753FCEB97C57104F-001/tempDir-002/repos/RjnvZTynyr/J_O8NbUaTNWMz5J76I_YOw/0/1/translog-152.ckp
  1> 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
  1> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) ~[?:?]
  1> 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
  1> 	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218) ~[?:?]
  1> 	at java.nio.file.Files.newByteChannel(Files.java:380) ~[?:?]
  1> 	at java.nio.file.Files.newByteChannel(Files.java:432) ~[?:?]
  1> 	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422) ~[?:?]
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:94) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:193) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at org.apache.lucene.tests.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:94) ~[lucene-test-framework-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
  1> 	at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
  1> 	at org.opensearch.common.blobstore.fs.FsBlobContainer.readBlob(FsBlobContainer.java:170) ~[main/:?]
  1> 	at org.opensearch.index.translog.transfer.BlobStoreTransferService.downloadBlob(BlobStoreTransferService.java:76) ~[main/:?]
  1> 	at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadToFS(TranslogTransferManager.java:155) ~[main/:?]
  1> 	at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadTranslog(TranslogTransferManager.java:141) ~[main/:?]
  1> 	at org.opensearch.index.translog.RemoteFsTranslog.download(RemoteFsTranslog.java:119) ~[main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.syncTranslogFilesFromRemoteTranslog(StoreRecovery.java:491) ~[main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.recoverFromRemoteStore(StoreRecovery.java:462) ~[main/:?]
  1> 	at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromRemoteStore$1(StoreRecovery.java:128) ~[main/:?]
  1> 	at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) ~[main/:?]
  1> 	... 9 more

@ashking94
Copy link
Member

ashking94 commented Feb 6, 2023

@dreamer-89 We can rebase this branch to have latest changes in. The fix for this issue has been backported to 2.x - #6170.

@dreamer-89
Copy link
Member

dreamer-89 commented Feb 6, 2023

@dependabot rebase

@kotwanikunal
Copy link
Member

@dependabot rebase

The backport and dependabot workflows are independent. You will have to manually rebase this one by pushing into the backport branch. 🙂

Signed-off-by: Rishav Sagar <rissag@amazon.com>
(cherry picked from commit e42b76f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@kotwanikunal kotwanikunal force-pushed the backport/backport-6143-to-2.x branch from 610e69c to c516009 Compare February 6, 2023 18:19
@kotwanikunal
Copy link
Member

@dreamer-89 Rebased it ✅

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2023

Gradle Check (Jenkins) Run Completed with:

@dreamer-89 dreamer-89 merged commit 905b2ad into 2.x Feb 6, 2023
@github-actions github-actions bot deleted the backport/backport-6143-to-2.x branch February 6, 2023 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants