-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-3082. Refactor recon missing containers task to detect under, over and mis-replicated containers. #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vivekratnavel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM.
Posted a few minor suggestions inline.
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
| containers.forEach(container -> | ||
| processContainer(container, currentTime)); | ||
| recordSingleRunCompletion(); | ||
| LOG.info("Missing Container task Thread took {} milliseconds for" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LOG.info("Missing Container task Thread took {} milliseconds for" + | |
| LOG.info("Container Health task thread took {} milliseconds for" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted. I have fixed this.
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Show resolved
Hide resolved
| * already set to. We only need to run a DB update statement if the record | ||
| * has really changed. The methods below ensure we do not update the Jooq | ||
| * record unless the values have changed and hence save a DB execution | ||
| * when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the dangling statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
...-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/fsck/TestContainerHealthStatus.java
Outdated
Show resolved
Hide resolved
...op-ozone/recon/src/test/java/org/apache/hadoop/ozone/recon/fsck/TestContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Show resolved
Hide resolved
|
Thank you @sodonnel. LGTM +1 |
…er and mis-replicated containers. (apache#994)
What changes were proposed in this pull request?
The current Recon "Missing Containers Task" only highlights missing containers in the cluster.
It is desired for it to also detect under, over and mis-replicated containers.
In order to do that, the existing database table MISSING_CONTAINERS has been renamed to UNHEALTHY_CONTAINERS, with the definition:
The container state can be MISSING, UNDER_REPLICATED, OVER_REPLICATED or MIS_REPLICATED.
A design decision was made so that if a container is MISSING, then it is not in any of the other states.
However, it can be both under and mis-replicated or in theory over and mis-replicated at the same time and this would result in two rows in the database for a single container.
Each time the "Container Health task" runs, it scans all the existing records, updates any counts and removes any records that are no longer valid.
Then it processes all other containers without any records in the unhealthy_containters table.
The reason the job is split into two parts, is to avoid the need to query the database for every single container on each run.
This change only adjusts the job and the backend storage. An additional change is needed to change the rest endpoints to expose the new container states to the users and UI.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-3082
How was this patch tested?
New and existing unit tests