-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
Majority of failed UT report "Could not initialize class org.apache.hadoop.ozone.util.OzoneVersionInfo". This class is not touched by the patch. |
|
/retest |
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse HDDS_SCM_SAFEMODE_HEALTHY_PIPELINE_THRESHOLD_PCT_DEFAULT or consolidate the criteria?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HDDS_SCM_SAFEMODE_THRESHOLD_PCT controls the open container percentage which has at least one replica reported, to exit safe mode, default value is 0.99.
HDDS_SCM_SAFEMODE_HEALTHY_PIPELINE_THRESHOLD_PCT controls the percentage of healthy pipeline(with all datanodes reported), default value is 0.1.
I'm not sure why open container only consider one replica reported is enough. From my understanding, open container with only one replica is not ready for use. Maybe we should think about these criteria first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a dead key being removed? Should we remove the corresponding consts defined in HddsConfigKeys.java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property "hdds.command.status.report.interval" appears two times in the ozone-default.xml with different default value. One is 60s, one is 30s. I kept the one with the value (60s) defined in HddsConfigKeys.java, and removed other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as mentioned earlier, can we consolidate the criteria for pipeline to exit safemode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a new cluster startup, it will exit safemode immediately currently. Because there is pipeline to wait for it's report from datanode. So if we don't add a min pipeline threshold, we will exit safemode without any pipeline ready to use.
The desired state is when safenode off, Ozone should be ready for read/write object.
Thant's the reason this property is added. HDDS_SCM_SAFEMODE_HEALTHY_PIPELINE_THRESHOLD_PCT_DEFAULT cannot fulfill this purpose, because the total pipeline number at the start of a new cluster is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to AtomicLong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the RatisPipelineUtils which contains destroyPipeline to o.a.h.ozone.container.common?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RatisPipelineUtils is removed by me. Do you means add a new class under container. common, to host the create and destroy pipeline logic for other module to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.error can be removed as the caller has another LOG error for the same exception threw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this HDDS_HEARTBEAT_INTERVAL_DEFAULT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
…NPE (apache#1302). Contributed by Gabor Bota. Change-Id: I2f637865c871e400b95fe7ddaa24bf99fa192023
8265edb to
e11eea9
Compare
|
💔 -1 overall
This message was automatically generated. |
Contributed by Siddharth Seth. Change-Id: I7f08201c9f7c0551514049389b5b398a84855191
…hen the excludedNode is not present. Contributed by Ranith Sardar.
Signed-off-by: Anu Engineer <[email protected]>
…d by Xiaoyu Yao. (apache#1490)
Signed-off-by: Anu Engineer <[email protected]>
…lasses (apache#1496) Signed-off-by: Vinayakumar B <[email protected]>
…buf classes (apache#1500) Signed-off-by: Vinayakumar B <[email protected]>
|
💔 -1 overall
This message was automatically generated. |
…Contributed by Prabhu Joseph
…Contributed by Zoltan Siegl
…deserialize ResourceMappings. Contributed by Zoltan Siegl
|
💔 -1 overall
This message was automatically generated. |
|
commit 9d6a1b9 Author: Doroszlai, Attila <[email protected]> Date: Mon Oct 7 13:42:35 2019 +0200 HDDS-2265. integration.sh may report false negative Revert "HDDS-2217. Remove log4j and audit configuration from the docker-config files" This reverts commit 4b0a5bc.
…ver reduced(addendum). Contributed by Surendra Singh Lilhore.
…he read/write path. (apache#1633)
…ed by Masatake Iwasaki.
Contributed by Steve Loughran. Change-Id: Ia9bb84bd6455e210a54cfe9eb944feeda8b58da9
Contributed by lqjacklee. Change-Id: I32bb00a683102e7ff8ff8ce0b8d9c3195ca7381c
Contributed by Prabhu Joseph and Shane Kumpf
…Contributed by Xiaoyu Yao. (apache#1642)
…pache#1576). Contributed by Gabor Bota. Fixes HADOOP-16349. DynamoDBMetadataStore.getVersionMarkerItem() to log at info/warn on retry Change-Id: Ia83e92b9039ccb780090c99c41b4f71ef7539d35
…. Contributed by Adam Antal
…with multiple resource types. Contributed by Adam Antal
…tAdditionalTokenIssuers (apache#1556)
… heartbeat commands.
…t commands.
https://issues.apache.org/jira/browse/HDDS-2034?filter=-1
Change list,