Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Restore default failover attempt count (15) for ozone-ha. The custom value (6) was inherited from ozone-om-ha, which exercises failovers.

The error happened if OM leader election took too long:

2021/01/28/5550/acceptance-misc:om2_1 leader elected after 6899ms
2021/01/27/5516/acceptance-misc:om2_1 leader elected after 7346ms
2021/01/27/5510/acceptance-misc:om3_1 leader elected after 7254ms
2021/01/28/5536/acceptance-misc:om2_1 leader elected after 7149ms
2021/01/26/5496/acceptance-misc:om2_1 leader elected after 6740ms

https://issues.apache.org/jira/browse/HDDS-4760

How was this patch tested?

https://github.com/adoroszlai/hadoop-ozone/actions/runs/520001429

om1_1 leader elected after 6877ms

@adoroszlai adoroszlai merged commit 4a854af into apache:master Feb 1, 2021
@adoroszlai adoroszlai deleted the HDDS-4760 branch February 1, 2021 16:54
@adoroszlai
Copy link
Contributor Author

Thanks @sodonnel for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Feb 1, 2021
* master: (176 commits)
  HDDS-4760. Intermittent failure in ozone-ha acceptance test (apache#1853)
  HDDS-4770. Upgrade Ratis Thirdparty to 0.6.0 (apache#1868)
  HDDS-4765. Update close-pending workflow for new repo (apache#1856)
  HDDS-4737. Add ModifierOrder to checkstyle rules (apache#1839)
  HDDS-4704. Add permission check in OMDBCheckpointServlet (apache#1801)
  HDDS-4757. Unnecessary WARNING to set OZONE_CONF_DIR (apache#1849)
  HDDS-4751. TestOzoneFileSystem#testTrash failed when enabledFileSystemPaths and omRatisDisabled (apache#1851)
  HDDS-4736. Intermittent failure in testExpiredCertificate (apache#1838)
  HDDS-4758. Adjust classpath of ozone version to include log4j (apache#1850)
  HDDS-4518. Add metrics around Trash Operations. (apache#1832)
  HDDS-4708. Optimization: update RetryCount less frequently (update once per ~100) (apache#1805)
  HDDS-4748. sonarqube issue fix - "static" members should be accessed statically (apache#1748)
  HDDS-2402. Adapt hadolint check to improved CI framework (apache#1778)
  HDDS-4698. Upgrade Java for Sonar check (apache#1800)
  HDDS-4739. Upgrade Ratis to 1.1.0-eb66796d-SNAPSHOT (apache#1842)
  HDDS-4735. Fix typo in hdds.proto (apache#1837)
  HDDS-4430. OM failover timeout is too short (apache#1807)
  HDDS-4477. Delete txnId in SCMMetadataStoreImpl may drop to 0 after SCM restart. (apache#1828)
  HDDS-4688. Update Hadoop version to 3.2.2 (apache#1795)
  HDDS-4725. Change metrics unit from nanosecond to millisecond (apache#1823)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants