-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13305. Create wrapper object for container checksums #8789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @echonesis for the patch.
...ds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/ContainerChecksumsTest.java
Outdated
Show resolved
Hide resolved
...ds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/ContainerChecksumsTest.java
Outdated
Show resolved
Hide resolved
| public ContainerChecksums(long dataChecksum) { | ||
| this(dataChecksum, null); | ||
| } | ||
|
|
||
| public ContainerChecksums(long dataChecksum, Long metadataChecksum) { | ||
| this.dataChecksum = dataChecksum; | ||
| this.metadataChecksum = metadataChecksum; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add factory methods (and make constructor private) to reduce accidental mismatch of intended and actual source (data or metadata).
Something like:
public static ContainerChecksums unknown(); // use constant value
public static ContainerChecksums dataOnly(long dataChecksum);
public static ContainerChecksums metadataOnly(long metadataChecksum);
public static ContainerChecksums of(long dataChecksum, long metadataChecksum);
...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerChecksums.java
Outdated
Show resolved
Hide resolved
| // All replicas should start with an empty data checksum in SCM. | ||
| boolean contOneDataChecksumsEmpty = containerManager.getContainerReplicas(contID).stream() | ||
| .allMatch(r -> r.getDataChecksum() == 0); | ||
| .allMatch(r -> r.getChecksums().getDataChecksum() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could reduce change by keeping the getDataChecksum() method in ContainerReplica and delegate to checksums.
|
Thanks @echonesis for updating the patch. Just one more question: why does |
I believe metadata serves a critical role in future operations and analysis. Regarding the |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @echonesis for updating the patch.
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @echonesis. Mostly looks good from the SCM and Recon side, just a few comments on top of what has already been left. We will need to add this to the datanode as well within ContainerData, ContainerLogger, and the summary section of KeyValueHandler#reconcileContainer, but it is probably better to do the datanode side in a follow-up PR.
| return new ContainerChecksums(dataChecksum, null); | ||
| } | ||
|
|
||
| public static ContainerChecksums metadataOnly(long metadataChecksum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have metadata checksum in the future, but since we do't have it right now, we don't need to include it in this change, as long as it is easy to add new checksums to this class in the future.
Currently we only need two types of checksum objects: Those that have no checksum and those that have a data checksum. When metadata checksum is added in the future for EC, it will be generated by the container scanner and set at the same time as the data checksum, so it will not be possible to have a ContainerChecksums object with only one checksum or the other. We can adjust the factory constructors accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @errose28
I will adjust ContainerChecksums constructors to be
- UNKNOWN (no checksums)
- 2 checksums included (dataChecksum + metadataChecksum)
| new ContainerChecksums(0, null); | ||
|
|
||
| private final long dataChecksum; | ||
| private final Long metadataChecksum; // nullable for future use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the recently merged #8565 for how we are planning to handle unset checksums. We can add this handling into the ContainerChecksums class so that the -1 placeholder gives us functionality similar to the has checks in protobuf objects.
| if (metadataChecksum != null) { | ||
| sb.append(", metadata=").append(Long.toHexString(metadataChecksum)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 should be the displayed value for any checksum that is not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I will set the default metadataChecksum value to be 0.
| return checksums; | ||
| } | ||
|
|
||
| public long getDataChecksum() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use getChecksums in place of getters for each individual checksum. Otherwise this class will need to be updated every time ContainerChecksums has a value added to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long getDataChecksum was retained on my request to reduce change in existing tests. We don't need to adjust this class for future values.
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.hadoop.hdds.scm.container; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this class on the datanode too, so it needs to be in a more general package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
I will move it into the path under hadoop-hdds/common.
| public boolean isDataChecksumMismatched() { | ||
| return !replicas.isEmpty() && replicas.stream() | ||
| .map(ContainerReplica::getDataChecksum) | ||
| .map(ContainerReplica::getChecksums) | ||
| .map(ContainerChecksums::getDataChecksum) | ||
| .distinct() | ||
| .count() != 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page in recon is designed to show containers with any checksum mismatches. We can rename the method accordingly and simplify the check so that it uses ContainerChecksum#equals to automatically compare all checksums going forward.
| public boolean isDataChecksumMismatched() { | |
| return !replicas.isEmpty() && replicas.stream() | |
| .map(ContainerReplica::getDataChecksum) | |
| .map(ContainerReplica::getChecksums) | |
| .map(ContainerChecksums::getDataChecksum) | |
| .distinct() | |
| .count() != 1; | |
| public boolean areChecksumsMismatched() { | |
| return !replicas.isEmpty() && replicas.stream() | |
| .map(ContainerReplica::getChecksums) | |
| .distinct() | |
| .count() != 1; |
|
|
||
| waitForScmToSeeReplicaState(containerID, CLOSED); | ||
| long initialReportedDataChecksum = getContainerReplica(containerID).getDataChecksum(); | ||
| long initialReportedDataChecksum = getContainerReplica(containerID).getChecksums().getDataChecksum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this use case we only need to use the ContainerChecksums objects and compare them. We don't need to further extract the long values.
There is a new assertReplicaChecksumMatches introduced in #8565 that can take the ContainerChecksums object as a parameter instead of a long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
I've updated assertReplicaChecksumMatches by replacing long with ContainerChecksums.
|
@errose28 please take another look |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @echonesis for working on this and sorry for the late review.
| public final class ContainerChecksums { | ||
|
|
||
| private static final ContainerChecksums UNKNOWN = | ||
| new ContainerChecksums(-1L, -1L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use UNSET_DATA_CHECKSUM and UNSET_METADATA_CHECKSUM
| public long getDataChecksum() { | ||
| // UNSET_DATA_CHECKSUM is an internal placeholder, it should not be used outside this class. | ||
| if (needsDataChecksum()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move comment to UNSET_DATA_CHECKSUM definition
| public long getMetadataChecksum() { | ||
| // UNSET_DATA_CHECKSUM is an internal placeholder, it should not be used outside this class. | ||
| if (needsMetadataChecksum()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should mention UNSET_METADATA_CHECKSUM, also please move comment to UNSET_METADATA_CHECKSUM definition
| .setBytesUsed(replicaProto.getUsed()) | ||
| .setEmpty(replicaProto.getIsEmpty()) | ||
| .setDataChecksum(replicaProto.getDataChecksum()) | ||
| .setChecksums(ContainerChecksums.of(replicaProto.getDataChecksum(), 0L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace all calls like this with ContainerChecksums.dataOnly(...) as suggested earlier.
Rationale:
- Avoid duplication of 0L as unknown value
- Avoid mismatch between internal-only constant with value -1L
- Easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adoroszlai
According to the review from @errose28 , it seems that we won't have a ContainerChecksums with only one checksum or the other.
And yes, I will follow the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only have data checksum, so we should not force all callers to pass 0L for metadata. I agree that we don't need metadataOnly(), but the factory method with single data checksum value should still be added. To reduce future changes (adding metadata checksum value), its name should be of() rather than dataOnly().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out!
I will modify it in the next commit.
| return getDataChecksum() == that.getDataChecksum() && | ||
| getMetadataChecksum() == that.getMetadataChecksum(); | ||
| } | ||
|
|
||
| @Override | ||
| public int hashCode() { | ||
| return Objects.hash(dataChecksum, metadataChecksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContainerChecksums(0, X) and ContainerChecksums(-1, X) are equal, but hash codes are different, which is a bad combination.
Do we want to consider these unequal? Then equals should use the variables.
Otherwise, if we want to consider these to be equal, and all other code passes 0 for unknown value, can we avoid using -1 internally? That would also make needs... methods unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should consider this unequality.
In #8565, it seems the UNSET placeholder could help us separate the 2 states:
- UNSET state
- Calculated state
| this.bcsId = bcsId; | ||
| this.state = state; | ||
| this.dataChecksum = dataChecksum; | ||
| this.checksums = checksums; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this.checksums = checksums; | |
| setChecksums(checksums); |
| } | ||
|
|
||
| public void setChecksums(ContainerChecksums checksums) { | ||
| this.checksums = checksums; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| this.checksums = checksums; | |
| this.checksums = checksums != null ? checksums : ContainerChecksums.unknown(); |
| .setFirstSeenTime(firstSeenTime).setLastSeenTime(lastSeenTime) | ||
| .setBcsId(bcsId).setState(state).setDataChecksum(dataChecksum).build(); | ||
| .setBcsId(bcsId).setState(state) | ||
| .setDataChecksum(checksums != null ? checksums.getDataChecksum() : 0L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .setDataChecksum(checksums != null ? checksums.getDataChecksum() : 0L) | |
| .setDataChecksum(checksums.getDataChecksum()) |
|
Thanks @adoroszlai for the review. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @echonesis for updating the patch, LGTM. There is one caller where I think 0L should be removed. Tests that set arbitrary checksum value are OK to use 0L.
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ContainerReplicaHistory.java
Outdated
Show resolved
Hide resolved
|
@errose28 please take another look |
|
Thanks @echonesis for the patch, @errose28 for the review. |
What changes were proposed in this pull request?
As mentioned in the JIRA, this PR will
dataChecksumandmetadataChecksummetadataChecksumWhat is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13305
How was this patch tested?
CI: https://github.com/echonesis/ozone/actions/runs/16518174299