Skip to content

Conversation

@adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Due to same root cause as HDDS-8276, ozone debug read-replicas may give wrong results with topology-aware read enabled. It creates a single-node copy of the original pipeline for each replica, but only updates nodes, not nodesInOrder. If the replica is stale (offline but not yet dead), read is attempted and fails, and the client checks if there are any fallback nodes. There should be none left (single-node pipeline), but with topology-aware read enabled the other two nodes are still present in the list.

This PR moves the fix that was originally added for EC file checksum to the pipeline copy builder, so that all usages get fixed (currently only these two cases change nodes in the copy pipeline).

https://issues.apache.org/jira/browse/HDDS-8364

How was this patch tested?

Updated relevant smoketest to enable topology-aware read for this command, ran locally.

Ran TestOzoneFileChecksum for regression.

CI:
https://github.com/adoroszlai/hadoop-ozone/actions/runs/4598039439

@kerneltime
Copy link
Contributor

cc @sumitagrawl can you please take a look?

@kerneltime kerneltime requested a review from ChenSammi April 3, 2023 16:13
@adoroszlai adoroszlai merged commit b7e08c1 into apache:master Apr 4, 2023
@adoroszlai adoroszlai deleted the HDDS-8364 branch April 4, 2023 04:56
@adoroszlai
Copy link
Contributor Author

Thanks @jojochuang for the review.

errose28 added a commit to errose28/ozone that referenced this pull request Apr 6, 2023
* master: (155 commits)
  update readme (apache#4535)
  HDDS-8374. Disable flaky unit test: TestContainerStateCounts
  HDDS-8016. updated the ozone doc for linked bucket and deletion async limitation (apache#4526)
  HDDS-8237. [Snapshot] loadDb() used by SstFiltering service creates extraneous directories. (apache#4446)
  HDDS-8035. Intermittent timeout in TestOzoneManagerHAWithData.testOMHAMetrics (apache#4362)
  HDDS-8039. Allow container inspector to run from ozone debug. (apache#4337)
  HDDS-8304. [Snapshot] Reduce flakiness in testSkipTrackingWithZeroSnapshot (apache#4487)
  HDDS-7974. [Snapshot] KeyDeletingService to be aware of Ozone snapshots (apache#4486)
  HDDS-8368. ReplicationManager: Create ContainerReplicaOp with correct target Datanode (apache#4532)
  HDDS-8358. Fix the space usage comparator in ContainerBalancerSelectionCriteria (apache#4527)
  HDDS-8359. ReplicationManager: Fix getContainerReplicationHealth() so that it builds ContainerCheckRequest correctly (apache#4528)
  HDDS-8361. Useless object in TestOzoneBlockTokenIdentifier (apache#4517)
  HDDS-8325. Consolidate and refine RocksDB metrics of services (apache#4506)
  HDDS-8135. Incorrect synchronization during certificate renewal in DefaultCertificateClient. (apache#4381)
  HDDS-8127. Exclude deleted containers from Recon container count (apache#4440)
  HDDS-8364. ReadReplicas may give wrong results with topology-aware read enabled (apache#4522)
  HDDS-8354. Avoid WARNING about ObjectEndpoint#get (apache#4515)
  HDDS-8324. DN data cache gets removed randomly asking for data from disk (apache#4499)
  HDDS-8291. Upgrade to Hadoop 3.3.5 (apache#4484)
  HDDS-8355. Mark TestOMRatisSnapshots#testInstallSnapshot as flaky
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants