-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4432. Update Ratis version to latest snapshot. #1586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@nandakumar131, @mukul1987, @lokeshj1703, TestNodeFailure#testPipelineFail is failing after Ratis version upgrade. Any idea what could be causing this regression? Was there any change in Ratis pipeline close logic? |
|
Thanks @hanishakoneru for working on this. The test failure is related to the recent ratis change for config name (DatanodeRatisServerConfig rpcslowness.timeout. ----> rpc.slowness.timeout). I have also updated ratis to the latest snapshot created. |
|
Shouldn't Ratis be backward compatible now that it has reached 1.0? Dropping existing API and renaming config keys without handling old ones look incompatible to me. Maybe this is not the right place to discuss this, but the changes necessary in Ozone to upgrade Ratis highlights these issues. |
@adoroszlai , i agree with you. This needs better handling for upgrade cases. |
|
Thanks @bshashikant for debugging and fixing the issue. Can we go ahead and merge this PR and discuss Ratis backward compatibility in dev list? |
|
It is quite possible that no existing cluster has overriden this config. But, it would be better to treat the old config key as deprecated (org.apache.hadoop.conf.Configuration.DeprecationDelta). |
Sure, I didn't want to block this PR. |
f3b2cde to
f6415ac
Compare
|
Opened HDDS-4493 to handle deprecated configs. |
avijayanhwx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1.
Can we follow up these backward incompatible changes in the Ratis mailing group?
|
Thanks Aravindan.
Sure. I will start a discussion. |
* HDDS-3698-upgrade: (46 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...
* HDDS-3698-upgrade: (47 commits) HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595) HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581) HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597) HDDS-4432. Update Ratis version to latest snapshot. (apache#1586) HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605) HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598) HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576) HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549) HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569) HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572) HDDS-4346.Ozone specific Trash Policy (apache#1535) HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561) HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526) HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573) HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589) HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591) HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590) HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594) HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592) HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585) ...
What changes were proposed in this pull request?
Update ozone with latest Ratis snapshot which has a critical fix for "Bootstrap new OM Node" feature - HDDS-4330.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4432
How was this patch tested?
Not required.