Skip to content

Conversation

@DaveTeng0
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-8982. Prevent infinite loop in getContainer which causes log flooded by WritableRatisContainerProvider if pipeline's nodes are not found

Please describe your PR in detail:

  • add the failed pipeline in exclude list on WritableRatisContainerProvider side
  • add maxRetry in configuration to make sure the while loop only executes no more than maxRetry of times.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8982

How was this patch tested?

unit test

@DaveTeng0
Copy link
Contributor Author

cc. @adoroszlai @sumitagrawl

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DaveTeng0 for working on this.

if (containerInfo.getContainerID() != -1) {
return containerInfo;
} else {
excludeList.addPipeline(containerInfo.getPipelineID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to exclude the faulty pipeline may not work, because:

if (pipelines.size() == 0 && !excludeList.isEmpty()) {
// if no pipelines can be found, try finding pipeline without
// exclusion
pipelines = pipelineManager.getPipelines(repConfig, pipelineState);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the reason I add it in the first part of findPipelinesByState(Line 100) of this method is to prevent that broken pipeline from being selected in the second part of findPipelinesByState (Line 164 , after pipelineManager.createPipeline) I just feel it's nice to have that, but it could be removed too. I'm open to any suggestion whether to keep it or not here. Thanks!

@adoroszlai adoroszlai changed the title HDDS-8982. Prevent infinite loop in getContainer which causes log flooded by WritableRatisContainerProvider if pipeline's nodes are not found HDDS-8982. Infinite loop in WritableRatisContainerProvider if pipeline's nodes are not found Dec 7, 2023
DaveTeng0 added 2 commits December 14, 2023 10:48
@adoroszlai
Copy link
Contributor

Please check CI run in fork before starting PR workflow. If there is a failure caused by the change (here TestWritableRatisContainerProvider is failing), we need to wait for further fix, no need to spend CI resources on repeating the same failing test.

org.mockito.exceptions.verification.TooManyActualInvocations: 

writableRatisContainerProvider.selectContainer(
    <any List>,
    <any long>,
    <any string>,
    <any org.apache.hadoop.hdds.scm.container.common.helpers.ExcludeList>
);
Wanted 3 times:
-> at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.selectContainer(WritableRatisContainerProvider.java:219)
But was 4096 times:
-> at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:104)
-> at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:104)
-> at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:104)
-> at org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:104)
...

@adoroszlai adoroszlai marked this pull request as draft January 4, 2024 14:54
@adoroszlai
Copy link
Contributor

Thanks again @DaveTeng0 for working on this. I prefer #5911 to this one. Marking as draft for now. Will close if #5911 gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants