-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8782. Improve Volume Scanner Health checks. #4867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 31 commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
34f6e5a
Refactor volume level tmp dir so it can be used for disk test files
errose28 1b3375d
Consolidate volume shutdown
errose28 0d15e8a
More refactoring of volume tmp dir
errose28 8572b34
Change tmpdir to File
errose28 7dced52
Another compilation fix after cherry-pick
errose28 442dde9
Update TestContainerPersistence and fix directory create bug
errose28 04384c1
Refactor working dir and subdirs on startup
errose28 72c0d31
Rename dir delete_container_service -> deleted-containers
errose28 b12bcd2
Move tmp dir cleanup inside HddsVolume
errose28 44dbd47
Update existing tests
errose28 15725ae
Initial implementation of improved volume checks
errose28 57bddac
Add counter for consecutive volume IO failures
errose28 23ab0a6
Add config for size of file used to assess disk health
errose28 1d45ece
Update disk check log messages
errose28 9951e04
Use tmp dir for disk check files
errose28 0e770c4
Create tmp disk check directory on startup and clear it
errose28 908ee85
Add startup/shutdown tests to TestHddsVolume
errose28 e38ec9c
Checkstyle
errose28 c27ed0e
Merge branch 'tmp-dir-refactor' into improve-volume-scanner
errose28 4ceff35
Fix assignment to tmp dir name fields
errose28 87847d1
Merge branch 'tmp-dir-refactor' into improve-volume-scanner
errose28 bfa064b
Add test for clearing tmp disk check files on startup/shutdown
errose28 ab9cd26
Add tests for DiskCheckUtil and support injecting
errose28 0013ba9
Add unit tests for StorageVolume#check
errose28 fa6b5b5
Checkstyle
errose28 ccded0a
Merge branch 'master' into tmp-dir-refactor
errose28 b581bc9
Merge branch 'tmp-dir-refactor' into improve-volume-scanner
errose28 d516b19
Checkstyle
errose28 5c63737
Merge branch 'tmp-dir-refactor' into improve-volume-scanner
errose28 6ddb500
Add check that per-volume RocksDB is present on volume scan
errose28 fada782
Synchronize volume checks
errose28 0369054
Restore endpoint test
errose28 1b20307
Separate DB store init from tmp dir creation
errose28 d579586
Fix SCM HA finalization compat test
errose28 e502a07
Merge branch 'master' into tmp-dir-refactor
errose28 d023a89
Initial improvement of volume check configurations
errose28 fa4207c
Merge branch 'tmp-dir-refactor' into improve-volume-scanner
errose28 d27093b
Merge branch 'master' into improve-volume-scanner
errose28 71e277b
Fix test cleanup regression in TestHddsVolume
errose28 1f5ff5a
Fix new failures in TestStorageVolume
errose28 ca67685
Ignore interrupt during volume scan
errose28 397a7b6
Checkstyle
errose28 7ad00f8
Reduce disk check gap default
errose28 a84e1cc
Rat and findbugs
errose28 6ad84d3
Bypass IO check config validaiton if disabled
errose28 259c269
Update config and vairable names
errose28 d5d92f8
Switch to sliding window based IO checks
errose28 b9bb49e
Increase disk check min gap to 10 minutes
errose28 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
199 changes: 199 additions & 0 deletions
199
...r-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/DiskCheckUtil.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,199 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hadoop.ozone.container.common.utils; | ||
|
|
||
| import com.google.common.annotations.VisibleForTesting; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| import java.io.File; | ||
| import java.io.FileInputStream; | ||
| import java.io.FileNotFoundException; | ||
| import java.io.FileOutputStream; | ||
| import java.io.IOException; | ||
| import java.io.SyncFailedException; | ||
| import java.util.Arrays; | ||
| import java.util.Random; | ||
| import java.util.UUID; | ||
|
|
||
| /** | ||
| * Utility class that supports checking disk health when provided a directory | ||
| * where the disk is mounted. | ||
| */ | ||
| public final class DiskCheckUtil { | ||
| private DiskCheckUtil() { } | ||
|
|
||
| // For testing purposes, an alternate check implementation can be provided | ||
| // to inject failures. | ||
| private static DiskChecks impl = new DiskChecksImpl(); | ||
|
|
||
| @VisibleForTesting | ||
| public static void setTestImpl(DiskChecks diskChecks) { | ||
| impl = diskChecks; | ||
| } | ||
|
|
||
| @VisibleForTesting | ||
| public static void clearTestImpl() { | ||
| impl = new DiskChecksImpl(); | ||
| } | ||
|
|
||
| public static boolean checkExistence(File storageDir) { | ||
| return impl.checkExistence(storageDir); | ||
| } | ||
|
|
||
| public static boolean checkPermissions(File storageDir) { | ||
| return impl.checkPermissions(storageDir); | ||
| } | ||
|
|
||
| public static boolean checkReadWrite(File storageDir, File testFileDir, | ||
| int numBytesToWrite) { | ||
| return impl.checkReadWrite(storageDir, testFileDir, numBytesToWrite); | ||
| } | ||
|
|
||
| /** | ||
| * Defines operations that must be implemented by a class injecting | ||
| * failures into this class. Default implementations return true so that | ||
| * tests only need to override methods for the failures they want to test. | ||
| */ | ||
| public interface DiskChecks { | ||
| default boolean checkExistence(File storageDir) { | ||
| return true; | ||
| } | ||
| default boolean checkPermissions(File storageDir) { | ||
| return true; | ||
| } | ||
| default boolean checkReadWrite(File storageDir, File testFileDir, | ||
| int numBytesToWrite) { | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The default implementation of DiskCheck that production code will use | ||
| * for disk checking. | ||
| */ | ||
| private static class DiskChecksImpl implements DiskChecks { | ||
|
|
||
| private static final Logger LOG = | ||
| LoggerFactory.getLogger(DiskCheckUtil.class); | ||
|
|
||
| @Override | ||
| public boolean checkExistence(File diskDir) { | ||
| if (!diskDir.exists()) { | ||
| logError(diskDir, "Directory does not exist."); | ||
| return false; | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| @Override | ||
| public boolean checkPermissions(File storageDir) { | ||
| // Check all permissions on the volume. If there are multiple permission | ||
| // errors, count it as one failure so the admin can fix them all at once. | ||
| boolean permissionsCorrect = true; | ||
| if (!storageDir.canRead()) { | ||
| logError(storageDir, | ||
| "Datanode does not have read permission on volume."); | ||
| permissionsCorrect = false; | ||
| } | ||
| if (!storageDir.canWrite()) { | ||
| logError(storageDir, | ||
| "Datanode does not have write permission on volume."); | ||
| permissionsCorrect = false; | ||
| } | ||
| if (!storageDir.canExecute()) { | ||
| logError(storageDir, "Datanode does not have execute" + | ||
| "permission on volume."); | ||
| permissionsCorrect = false; | ||
| } | ||
|
|
||
| return permissionsCorrect; | ||
| } | ||
|
|
||
| @Override | ||
| public boolean checkReadWrite(File storageDir, | ||
| File testFileDir, int numBytesToWrite) { | ||
| File testFile = new File(testFileDir, "disk-check-" + UUID.randomUUID()); | ||
| byte[] writtenBytes = new byte[numBytesToWrite]; | ||
| new Random().nextBytes(writtenBytes); | ||
| try (FileOutputStream fos = new FileOutputStream(testFile)) { | ||
| fos.write(writtenBytes); | ||
| fos.getFD().sync(); | ||
| } catch (FileNotFoundException notFoundEx) { | ||
| logError(storageDir, String.format("Could not find file %s for " + | ||
| "volume check.", testFile), notFoundEx); | ||
| return false; | ||
| } catch (SyncFailedException syncEx) { | ||
| logError(storageDir, String.format("Could sync file %s to disk.", | ||
| testFile), syncEx); | ||
| return false; | ||
| } catch (IOException ioEx) { | ||
| logError(storageDir, String.format("Could not write file %s " + | ||
| "for volume check.", testFile), ioEx); | ||
| return false; | ||
| } | ||
|
|
||
| // Read data back from the test file. | ||
| byte[] readBytes = new byte[numBytesToWrite]; | ||
| try (FileInputStream fis = new FileInputStream(testFile)) { | ||
| int numBytesRead = fis.read(readBytes); | ||
| if (numBytesRead != numBytesToWrite) { | ||
| logError(storageDir, String.format("%d bytes written to file %s " + | ||
| "but %d bytes were read back.", numBytesToWrite, testFile, | ||
| numBytesRead)); | ||
| return false; | ||
| } | ||
| } catch (FileNotFoundException notFoundEx) { | ||
| logError(storageDir, String.format("Could not find file %s " + | ||
| "for volume check.", testFile), notFoundEx); | ||
| return false; | ||
| } catch (IOException ioEx) { | ||
| logError(storageDir, String.format("Could not read file %s " + | ||
| "for volume check.", testFile), ioEx); | ||
| return false; | ||
| } | ||
|
|
||
| // Check that test file has the expected content. | ||
| if (!Arrays.equals(writtenBytes, readBytes)) { | ||
| logError(storageDir, String.format("%d Bytes read from file " + | ||
| "%s do not match the %d bytes that were written.", | ||
| writtenBytes.length, testFile, readBytes.length)); | ||
| return false; | ||
| } | ||
|
|
||
| // Delete the file. | ||
| if (!testFile.delete()) { | ||
| logError(storageDir, String.format("Could not delete file %s " + | ||
| "for volume check.", testFile)); | ||
| return false; | ||
| } | ||
|
|
||
| // If all checks passed, the volume is healthy. | ||
| return true; | ||
| } | ||
|
|
||
| private void logError(File storageDir, String message) { | ||
| LOG.error("Volume {} failed health check. {}", storageDir, message); | ||
| } | ||
|
|
||
| private void logError(File storageDir, String message, Exception ex) { | ||
| LOG.error("Volume {} failed health check. {}", storageDir, message, ex); | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.