-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-9819. Recon - Potential memory overflow in Container Health Task. #5841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sumitagrawl @adoroszlai pls review. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @devmadhuu for the patch.
| Map<String, Long>> | ||
| unhealthyContainerStateStatsMap) { | ||
| unhealthyContainerStateStatsMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please avoid unnecessary space changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| if (container.isUnderReplicated() | ||
| && !recordForStateExists.contains( | ||
| UnHealthyContainerStates.UNDER_REPLICATED.toString())) { | ||
| UnHealthyContainerStates.UNDER_REPLICATED.toString())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please avoid unnecessary space changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
|
||
| private void checkAndProcessContainers( | ||
| Map<UnHealthyContainerStates, Map<String, Long>> | ||
| unhealthyContainerStateStatsMap, long start, long currentTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start should not be a parameter, it should be set at the start of each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made the changes. Thanks.
|
@devmadhuu please check if test failure in |
Thanks @adoroszlai for review. I have resolved the test case failure. |
|
Thanks @devmadhuu for updating the patch. @ArafatKhan2198 please review |
ArafatKhan2198
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @devmadhuu I have left a few comments.
| unhealthyContainerStateStatsMap, long currentTime) { | ||
| ContainerID startID = ContainerID.valueOf(1); | ||
| List<ContainerInfo> containers = containerManager.getContainers(startID, | ||
| Integer.parseInt(DEFAULT_FETCH_COUNT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we replace the usage of Integer.parseInt(DEFAULT_FETCH_COUNT) with a named constant as DEFAULT_FETCH_COUNT is a constant.
private static final int FETCH_COUNT = Integer.parseInt(DEFAULT_FETCH_COUNT);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
| containers = containerManager.getContainers(startID, | ||
| Integer.parseInt(DEFAULT_FETCH_COUNT)); | ||
| } | ||
| containers.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The containers.clear(); statement inside the if block is meant to clear the list before fetching the next batch. However, the last containers.clear(); statement just outside the if block seems redundant since the loop condition (while (!containers.isEmpty())) ensures that it's only executed when containers is empty.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArafatKhan2198 I couldn't understand this comment, as I don't see containers.clear() outside the loop. Can you pls clarify ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the earlier mistake in the comment. I have now corrected it. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we have a case where last iteration in batch will not execute the if condition , so need to clear -off any in memory.
|
Thanks for the changes @devmadhuu LGTM! |
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devmadhuu Thanks for working over this, have few comments
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/fsck/ContainerHealthTask.java
Show resolved
Hide resolved
sumitagrawl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @devmadhuu for updating the patch. Please be careful when merging changes from master (sorry for the conflict).
|
|
||
| List<UnhealthyContainers> all = unHealthyContainersTableHandle.findAll(); | ||
| assertThat(all).isEmpty(); | ||
| assertTrue(all.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep assertThat, added by HDDS-10034.
| assertTrue(all.isEmpty()); | |
| assertThat(all).isEmpty(); |
| assertTrue(taskStatus.getLastUpdatedTimestamp() > | ||
| currentTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep assertThat, added by HDDS-10034.
| assertTrue(taskStatus.getLastUpdatedTimestamp() > | |
| currentTime); | |
| assertThat(taskStatus.getLastUpdatedTimestamp()).isGreaterThan(currentTime); |
| assertTrue(taskStatus.getLastUpdatedTimestamp() > | ||
| currentTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep assertThat, added by HDDS-10034.
| assertTrue(taskStatus.getLastUpdatedTimestamp() > | |
| currentTime); | |
| assertThat(taskStatus.getLastUpdatedTimestamp()).isGreaterThan(currentTime); |
|
|
||
| List<UnhealthyContainers> all = unHealthyContainersTableHandle.findAll(); | ||
| assertThat(all).isEmpty(); | ||
| assertTrue(all.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep assertThat, added by HDDS-10034.
| assertTrue(all.isEmpty()); | |
| assertThat(all).isEmpty(); |
| import static org.assertj.core.api.Assertions.assertThat; | ||
| import static org.junit.jupiter.api.Assertions.assertEquals; | ||
| import static org.junit.jupiter.api.Assertions.assertNotNull; | ||
| import static org.junit.jupiter.api.Assertions.assertTrue; | ||
| import static org.mockito.ArgumentMatchers.any; | ||
| import static org.mockito.ArgumentMatchers.anyInt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep assertThat instead of restoring assertTrue.
|
Thanks @devmadhuu for the patch, @ArafatKhan2198, @sumitagrawl for the review. |
apache#5841) (cherry picked from commit 1398f58)
What changes were proposed in this pull request?
This PR addresses the issue of potential. memory overflow in case there are millions of containers because ContainerHealthTask tries to load all containers available in memory using
containerManager.getContainers(), this API loads maximum available containers. So to avoid any memory overflow, this PR changes the API to use paginated based API.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9819
How was this patch tested?
This patch was tested using existing junit test cases in order not to break anything.