Skip to content

Conversation

@echonesis
Copy link
Contributor

@echonesis echonesis commented Jul 11, 2025

What changes were proposed in this pull request?

As mentioned in the JIRA, this PR will

  • create ContainerChecksums Wrapper to include dataChecksum and metadataChecksum
  • provide further extension by metadataChecksum

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13305

How was this patch tested?

CI: https://github.com/echonesis/ozone/actions/runs/16518174299

@echonesis echonesis marked this pull request as draft July 11, 2025 09:28
@echonesis echonesis marked this pull request as ready for review July 12, 2025 01:40
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echonesis for the patch.

Comment on lines 30 to 37
public ContainerChecksums(long dataChecksum) {
this(dataChecksum, null);
}

public ContainerChecksums(long dataChecksum, Long metadataChecksum) {
this.dataChecksum = dataChecksum;
this.metadataChecksum = metadataChecksum;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add factory methods (and make constructor private) to reduce accidental mismatch of intended and actual source (data or metadata).

Something like:

public static ContainerChecksums unknown(); // use constant value
public static ContainerChecksums dataOnly(long dataChecksum);
public static ContainerChecksums metadataOnly(long metadataChecksum);
public static ContainerChecksums of(long dataChecksum, long metadataChecksum);

// All replicas should start with an empty data checksum in SCM.
boolean contOneDataChecksumsEmpty = containerManager.getContainerReplicas(contID).stream()
.allMatch(r -> r.getDataChecksum() == 0);
.allMatch(r -> r.getChecksums().getDataChecksum() == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reduce change by keeping the getDataChecksum() method in ContainerReplica and delegate to checksums.

@adoroszlai
Copy link
Contributor

Thanks @echonesis for updating the patch. Just one more question: why does ContainerChecksums use long 0 for unknown data checksum, and Long null for unknown/missing metadata checksum? Is there a use case where there is a difference between 0 and null?

@echonesis
Copy link
Contributor Author

Thanks @echonesis for updating the patch. Just one more question: why does ContainerChecksums use long 0 for unknown data checksum, and Long null for unknown/missing metadata checksum? Is there a use case where there is a difference between 0 and null?

I believe metadata serves a critical role in future operations and analysis. Regarding the toString() method, I propose we differentiate between two checksum types: the existing data checksum and a new metadata checksum.
I'm treating this as a schema extension, specifically by introducing nullability. This ensures backward compatibility, so existing consumers using older APIs won't experience breakage.

@echonesis echonesis requested a review from adoroszlai July 23, 2025 08:17
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echonesis for updating the patch.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @echonesis. Mostly looks good from the SCM and Recon side, just a few comments on top of what has already been left. We will need to add this to the datanode as well within ContainerData, ContainerLogger, and the summary section of KeyValueHandler#reconcileContainer, but it is probably better to do the datanode side in a follow-up PR.

return new ContainerChecksums(dataChecksum, null);
}

public static ContainerChecksums metadataOnly(long metadataChecksum) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have metadata checksum in the future, but since we do't have it right now, we don't need to include it in this change, as long as it is easy to add new checksums to this class in the future.

Currently we only need two types of checksum objects: Those that have no checksum and those that have a data checksum. When metadata checksum is added in the future for EC, it will be generated by the container scanner and set at the same time as the data checksum, so it will not be possible to have a ContainerChecksums object with only one checksum or the other. We can adjust the factory constructors accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @errose28
I will adjust ContainerChecksums constructors to be

  1. UNKNOWN (no checksums)
  2. 2 checksums included (dataChecksum + metadataChecksum)

new ContainerChecksums(0, null);

private final long dataChecksum;
private final Long metadataChecksum; // nullable for future use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the recently merged #8565 for how we are planning to handle unset checksums. We can add this handling into the ContainerChecksums class so that the -1 placeholder gives us functionality similar to the has checks in protobuf objects.

Comment on lines 87 to 89
if (metadataChecksum != null) {
sb.append(", metadata=").append(Long.toHexString(metadataChecksum));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 should be the displayed value for any checksum that is not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I will set the default metadataChecksum value to be 0.

return checksums;
}

public long getDataChecksum() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just use getChecksums in place of getters for each individual checksum. Otherwise this class will need to be updated every time ContainerChecksums has a value added to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long getDataChecksum was retained on my request to reduce change in existing tests. We don't need to adjust this class for future values.

* limitations under the License.
*/

package org.apache.hadoop.hdds.scm.container;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this class on the datanode too, so it needs to be in a more general package.

Copy link
Contributor Author

@echonesis echonesis Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
I will move it into the path under hadoop-hdds/common.

Comment on lines 164 to 169
public boolean isDataChecksumMismatched() {
return !replicas.isEmpty() && replicas.stream()
.map(ContainerReplica::getDataChecksum)
.map(ContainerReplica::getChecksums)
.map(ContainerChecksums::getDataChecksum)
.distinct()
.count() != 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page in recon is designed to show containers with any checksum mismatches. We can rename the method accordingly and simplify the check so that it uses ContainerChecksum#equals to automatically compare all checksums going forward.

Suggested change
public boolean isDataChecksumMismatched() {
return !replicas.isEmpty() && replicas.stream()
.map(ContainerReplica::getDataChecksum)
.map(ContainerReplica::getChecksums)
.map(ContainerChecksums::getDataChecksum)
.distinct()
.count() != 1;
public boolean areChecksumsMismatched() {
return !replicas.isEmpty() && replicas.stream()
.map(ContainerReplica::getChecksums)
.distinct()
.count() != 1;


waitForScmToSeeReplicaState(containerID, CLOSED);
long initialReportedDataChecksum = getContainerReplica(containerID).getDataChecksum();
long initialReportedDataChecksum = getContainerReplica(containerID).getChecksums().getDataChecksum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this use case we only need to use the ContainerChecksums objects and compare them. We don't need to further extract the long values.

There is a new assertReplicaChecksumMatches introduced in #8565 that can take the ContainerChecksums object as a parameter instead of a long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
I've updated assertReplicaChecksumMatches by replacing long with ContainerChecksums.

@echonesis echonesis requested a review from errose28 July 25, 2025 07:54
@adoroszlai
Copy link
Contributor

@errose28 please take another look

@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 11, 2025
@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 25, 2025
@adoroszlai adoroszlai reopened this Dec 15, 2025
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echonesis for working on this and sorry for the late review.

public final class ContainerChecksums {

private static final ContainerChecksums UNKNOWN =
new ContainerChecksums(-1L, -1L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use UNSET_DATA_CHECKSUM and UNSET_METADATA_CHECKSUM

Comment on lines 54 to 56
public long getDataChecksum() {
// UNSET_DATA_CHECKSUM is an internal placeholder, it should not be used outside this class.
if (needsDataChecksum()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move comment to UNSET_DATA_CHECKSUM definition

Comment on lines 66 to 68
public long getMetadataChecksum() {
// UNSET_DATA_CHECKSUM is an internal placeholder, it should not be used outside this class.
if (needsMetadataChecksum()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should mention UNSET_METADATA_CHECKSUM, also please move comment to UNSET_METADATA_CHECKSUM definition

.setBytesUsed(replicaProto.getUsed())
.setEmpty(replicaProto.getIsEmpty())
.setDataChecksum(replicaProto.getDataChecksum())
.setChecksums(ContainerChecksums.of(replicaProto.getDataChecksum(), 0L))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace all calls like this with ContainerChecksums.dataOnly(...) as suggested earlier.

Rationale:

  • Avoid duplication of 0L as unknown value
  • Avoid mismatch between internal-only constant with value -1L
  • Easier to understand

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adoroszlai
According to the review from @errose28 , it seems that we won't have a ContainerChecksums with only one checksum or the other.
And yes, I will follow the rationale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only have data checksum, so we should not force all callers to pass 0L for metadata. I agree that we don't need metadataOnly(), but the factory method with single data checksum value should still be added. To reduce future changes (adding metadata checksum value), its name should be of() rather than dataOnly().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out!
I will modify it in the next commit.

Comment on lines 87 to 93
return getDataChecksum() == that.getDataChecksum() &&
getMetadataChecksum() == that.getMetadataChecksum();
}

@Override
public int hashCode() {
return Objects.hash(dataChecksum, metadataChecksum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContainerChecksums(0, X) and ContainerChecksums(-1, X) are equal, but hash codes are different, which is a bad combination.

Do we want to consider these unequal? Then equals should use the variables.

Otherwise, if we want to consider these to be equal, and all other code passes 0 for unknown value, can we avoid using -1 internally? That would also make needs... methods unnecessary.

Copy link
Contributor Author

@echonesis echonesis Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should consider this unequality.
In #8565, it seems the UNSET placeholder could help us separate the 2 states:

  • UNSET state
  • Calculated state

this.bcsId = bcsId;
this.state = state;
this.dataChecksum = dataChecksum;
this.checksums = checksums;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.checksums = checksums;
setChecksums(checksums);

}

public void setChecksums(ContainerChecksums checksums) {
this.checksums = checksums;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.checksums = checksums;
this.checksums = checksums != null ? checksums : ContainerChecksums.unknown();

.setFirstSeenTime(firstSeenTime).setLastSeenTime(lastSeenTime)
.setBcsId(bcsId).setState(state).setDataChecksum(dataChecksum).build();
.setBcsId(bcsId).setState(state)
.setDataChecksum(checksums != null ? checksums.getDataChecksum() : 0L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.setDataChecksum(checksums != null ? checksums.getDataChecksum() : 0L)
.setDataChecksum(checksums.getDataChecksum())

@adoroszlai adoroszlai removed the stale label Dec 15, 2025
@echonesis
Copy link
Contributor Author

Thanks @adoroszlai for the review.
I will update them in the next commit.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @echonesis for updating the patch, LGTM. There is one caller where I think 0L should be removed. Tests that set arbitrary checksum value are OK to use 0L.

@adoroszlai
Copy link
Contributor

@errose28 please take another look

@adoroszlai adoroszlai merged commit b32b54b into apache:master Dec 19, 2025
83 of 84 checks passed
@adoroszlai
Copy link
Contributor

Thanks @echonesis for the patch, @errose28 for the review.

@echonesis echonesis deleted the HDDS-13305 branch December 24, 2025 02:59
echonesis added a commit to echonesis/ozone that referenced this pull request Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants