Skip to content

Conversation

@hanishakoneru
Copy link
Contributor

What changes were proposed in this pull request?

Update ozone with latest Ratis snapshot which has a critical fix for "Bootstrap new OM Node" feature - HDDS-4330.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-4432

How was this patch tested?

Not required.

@hanishakoneru
Copy link
Contributor Author

@nandakumar131, @mukul1987, @lokeshj1703, TestNodeFailure#testPipelineFail is failing after Ratis version upgrade. Any idea what could be causing this regression? Was there any change in Ratis pipeline close logic?

@bshashikant
Copy link
Contributor

Thanks @hanishakoneru for working on this. The test failure is related to the recent ratis change for config name (DatanodeRatisServerConfig rpcslowness.timeout. ----> rpc.slowness.timeout). I have also updated ratis to the latest snapshot created.

@adoroszlai
Copy link
Contributor

Shouldn't Ratis be backward compatible now that it has reached 1.0? Dropping existing API and renaming config keys without handling old ones look incompatible to me. Maybe this is not the right place to discuss this, but the changes necessary in Ozone to upgrade Ratis highlights these issues.

@bshashikant
Copy link
Contributor

Shouldn't Ratis be backward compatible now that it has reached 1.0? Dropping existing API and renaming config keys without handling old ones look incompatible to me. Maybe this is not the right place to discuss this, but the changes necessary in Ozone to upgrade Ratis highlights these issues.

@adoroszlai , i agree with you. This needs better handling for upgrade cases.
@avijayanhwx , any thoughts on this?

@hanishakoneru
Copy link
Contributor Author

Thanks @bshashikant for debugging and fixing the issue.

Can we go ahead and merge this PR and discuss Ratis backward compatibility in dev list?
@adoroszlai, @avijayanhwx, do you see any issues with merging this PR?

@avijayanhwx
Copy link
Contributor

It is quite possible that no existing cluster has overriden this config. But, it would be better to treat the old config key as deprecated (org.apache.hadoop.conf.Configuration.DeprecationDelta).

@adoroszlai
Copy link
Contributor

Can we go ahead and merge this PR and discuss Ratis backward compatibility in dev list?

Sure, I didn't want to block this PR.

@hanishakoneru
Copy link
Contributor Author

Opened HDDS-4493 to handle deprecated configs.

Copy link
Contributor

@avijayanhwx avijayanhwx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1.

Can we follow up these backward incompatible changes in the Ratis mailing group?

@hanishakoneru
Copy link
Contributor Author

Thanks Aravindan.

Can we follow up these backward incompatible changes in the Ratis mailing group?

Sure. I will start a discussion.

@hanishakoneru hanishakoneru merged commit 49cd3ec into apache:master Nov 20, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Nov 24, 2020
* HDDS-3698-upgrade: (46 commits)
  HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595)
  HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581)
  HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597)
  HDDS-4432. Update Ratis version to latest snapshot. (apache#1586)
  HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605)
  HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598)
  HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576)
  HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549)
  HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569)
  HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572)
  HDDS-4346.Ozone specific Trash Policy (apache#1535)
  HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561)
  HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526)
  HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573)
  HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589)
  HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591)
  HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590)
  HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594)
  HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Nov 25, 2020
* HDDS-3698-upgrade: (47 commits)
  HDDS-4468. Fix Goofys listBucket large than 1000 objects will stuck forever (apache#1595)
  HDDS-4417. Simplify Ozone client code with configuration object -- addendum (apache#1581)
  HDDS-4476. Improve the ZH translation of the HA.md in doc. (apache#1597)
  HDDS-4432. Update Ratis version to latest snapshot. (apache#1586)
  HDDS-4488. Open RocksDB read only when loading containers at Datanode startup (apache#1605)
  HDDS-4478. Large deletedKeyset slows down OM via listStatus. (apache#1598)
  HDDS-4452. findbugs.sh couldn't be executed after a full build (apache#1576)
  HDDS-4427. Avoid ContainerCache in ContainerReader at Datanode startup (apache#1549)
  HDDS-4448. Duplicate refreshPipeline in listStatus (apache#1569)
  HDDS-4450. Cannot run ozone if HADOOP_HOME points to Hadoop install (apache#1572)
  HDDS-4346.Ozone specific Trash Policy (apache#1535)
  HDDS-4426. SCM should create transactions using all blocks received from OM (apache#1561)
  HDDS-4399. Safe mode rule for piplelines should only consider open pipelines. (apache#1526)
  HDDS-4367. Configuration for deletion service intervals should be different for OM, SCM and datanodes (apache#1573)
  HDDS-4462. Add --frozen-lockfile to pnpm install to prevent ozone-recon-web/pnpm-lock.yaml from being updated automatically (apache#1589)
  HDDS-4082. Create ZH translation of HA.md in doc. (apache#1591)
  HDDS-4464. Upgrade httpclient version due to CVE-2020-13956. (apache#1590)
  HDDS-4467. Acceptance test fails due to new Hadoop 3 image (apache#1594)
  HDDS-4466. Update url in .asf.yaml to use TLP project (apache#1592)
  HDDS-4458. Fix Max Transaction ID value in OM. (apache#1585)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants