-
Notifications
You must be signed in to change notification settings - Fork 592
HDDS-4622: Use singe server raft cluster in MiniOzoneCluster. #1738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d32258e to
5079c33
Compare
|
O seems like we are more or less doing the similar change :) #1733 |
| if (Strings.isNullOrEmpty(storageDir)) { | ||
| storageDir = ServerUtils.getDefaultRatisDirectory(conf); | ||
| File metaDirPath = ServerUtils.getOzoneMetaDirPath(conf); | ||
| storageDir = (new File(metaDirPath, "scm-ha")).getPath(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's prefered to use SCMHA Configuration to set the storageDir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #1739 this can be set through HA config.
We can fix this issue together, and you can take away my changes if needed. |
| public static StorageContainerManager getScmSimple(OzoneConfiguration conf) | ||
| throws IOException, AuthenticationException { | ||
| SCMConfigurator configurator = new SCMConfigurator(); | ||
| configurator.setSCMHAManager(MockSCMHAManager.getInstance(true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need this line:
conf.setBoolean(ScmConfigKeys.OZONE_SCM_HA_ENABLE_KEY, true);
SCMHAManagerImpl will return isLeader immediately without enabling this HA:
if (!SCMHAUtils.isSCMHAEnabled(conf)) {
// When SCM HA is not enabled, the current SCM is always the leader.
return Optional.of((long)0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently the cluster.waitForClusterToBeReady returns quickly even the SCM has not become leader yet.
I am guessing CI can still pass become you also have changed the rpc timeout parameters. So either the cluster wait for ready or the parameter change can solve the not leader problem.
I am not sure if we need both. Maybe we only need to keep cluster.waitForClusterToBeReady?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
| Rpc.setRequestTimeout(properties, TimeDuration.valueOf( | ||
| conf.getRatisRequestTimeout(), TimeUnit.MILLISECONDS)); | ||
| Rpc.setTimeoutMin(properties, TimeDuration.valueOf( | ||
| conf.getRatisRequestMinTimeout(), TimeUnit.MILLISECONDS)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need to update these parameters. The cluster.waitForClusterToBeReady is supposed to wait for SCM become leader thus solve the not leader exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we only need the wait leader in cluster.waitForClusterToBeReady. The problem here is, we've set the min/max timeout un-properly. I will create a separate jira for his fix.
| lock.readLock().lock(); | ||
| try { | ||
| return stateManager | ||
| return stateManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the dead lock solution seems is removing the read write lock? Is there an explanation why we can remove locks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a jira for this issue, can give the explanation of this solution there.
What changes were proposed in this pull request?
Use
SCMHAManagerImplandSCMRatisServerImplinstead ofMockSCMHAManagerandMockRatisServerinMiniOzoneCluster.The
MockRatisServerandMockSCMHAManagercan not fully test the HA code.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4622
How was this patch tested?
CI