-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion #1295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… returned by container list command.
|
@ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier. |
Sure. Currently there are two issues.
In our cluster, there are a lot of containers, all blocks are deleted. But I can still get the keyCount and BytesUsed data through CLI "sh container info" command. |
adoroszlai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChenSammi for working on this.
KeyCount is increased by 1 for every putblock command. This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout.
I don't think this problem is specific to the FilePerBlock layout. PutBlock may be called for the same block multiple times in both layouts, depending on the buffer flush and block size configuration.
Config:
ozone.client.stream.buffer.size=1MB
ozone.client.stream.buffer.flush.size=4MB
ozone.scm.chunk.size=1MB
ozone.scm.container.size=1GB
Test with Ozone 0.5.0, before FilePerBlock was introduced:
$ ozone freon ockg -n1 -t1 -s 67108864 -p 64MB
...
Successful executions: 1
$ ozone scmcli container list
{
"state" : "OPEN",
"replicationFactor" : "THREE",
"replicationType" : "RATIS",
"usedBytes" : 67108864,
"numberOfKeys" : 16,
"lastUsed" : 156717083,
"stateEnterTime" : 156589360,
"owner" : "5b4981a0-61f4-42d1-af2d-b990e9a77741",
"containerID" : 1,
"deleteTransactionId" : 0,
"sequenceId" : 0,
"open" : true
}
I have verified that this case is fixed by the patch.
However, there is another case where similar problem still exists: if key is so large that it spans multiple blocks, numberOfKeys will include each block separately.
Test with patch using default block size of 256MB (after previous 64MB test):
$ ozone freon ockg -n1 -t1 -s 536870912 -p 512MB
...
Successful executions: 1
$ ozone admin container list
{
"state" : "OPEN",
"replicationFactor" : "THREE",
"replicationType" : "RATIS",
"usedBytes" : 600834048,
"numberOfKeys" : 3,
"lastUsed" : "2020-08-10T08:32:43.112Z",
"stateEnterTime" : "2020-08-10T08:22:39.855Z",
"owner" : "f430cf45-c4ba-4271-8b3a-22bd314be60d",
"containerID" : 1,
"deleteTransactionId" : 0,
"sequenceId" : 0,
"open" : true
}
Here numberOfKeys should be 2, but it is 3 = 1 (for 64MB key) + 2 (for 512MB key).
| */ | ||
| @Ignore | ||
| public class TestBlockDeletion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #1121 for an attempt to enable this integration test.
| // check container key count and bytes used | ||
| long maxUsedBytes = 0; | ||
| long maxKeyCount = 0; | ||
| ContainerReplica[] rps = replicas.toArray(new ContainerReplica[0]); | ||
| for (int i = 0; i < rps.length; i++) { | ||
| maxUsedBytes = Math.max(maxUsedBytes, rps[i].getBytesUsed()); | ||
| maxKeyCount = Math.max(maxKeyCount, rps[i].getKeyCount()); | ||
| LOG.info("Replica key count {}, bytes used {}", | ||
| rps[i].getKeyCount(), rps[i].getBytesUsed()); | ||
| } | ||
| if (maxKeyCount < container.getNumberOfKeys()) { | ||
| container.setNumberOfKeys(maxKeyCount); | ||
| } | ||
| if (maxUsedBytes < container.getUsedBytes()) { | ||
| container.setUsedBytes(maxUsedBytes); | ||
| } | ||
| LOG.info("Container key count {}, bytes used {}", | ||
| container.getNumberOfKeys(), container.getUsedBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this block to a new method to keep the level of abstraction consistent.
| LOG.info( | ||
| "Block deletion txnID mismatch in datanode {} for containerID {}." | ||
| LOG.debug( | ||
| "Block deletion txnID behinds in datanode {} for containerID {}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Block deletion txnID behinds in datanode {} for containerID {}." | |
| "Block deletion txnID lagging in datanode {} for containerID {}." |
@adoroszlai thanks for the review and comments. The keyCount in each container is different from the keyCount from OM point of view. The keyCount in container is actually the block count(FilePerBlock) or the chunk count(FilePerChunk). So in the above case, 3 numberOfKeys is an expected value. |
@adoroszlai , here I think WriteChunk command will be called multiple times instead of putBlock command, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ChenSammi for updating the patch. It would be nice if you could find a better name for the new method, but it is OK anyway.
here I think WriteChunk command will be called multiple times instead of putBlock command, right?
Not only WriteChunk, but PutBlock, too. It's like: WriteChunk, WriteChunk, ..., PutBlock, WriteChunk, WriteChunk, ..., PutBlock.
keyCount in container is actually the block count(FilePerBlock) or the chunk count(FilePerChunk)
I don't think there is such a difference in the layouts: the client does not know about the layout, and the server updates statistics while handling the PutBlock command, regardless of how the chunk is written to disk.
| * Check and update Container key count and used bytes based on it's replica's | ||
| * data. | ||
| */ | ||
| private void checkAndUpdateContainerState(final ContainerInfo container, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something along the lines of updateContainerStatsFromReplicas would be more descriptive, especially since container state (closed, etc.) is not being changed.
|
Thanks @adoroszlai for review the code. |
* master: (28 commits) HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion (apache#1295) HDDS-3232. Include the byteman scripts in the distribution tar file (apache#1309) HDDS-4095. Byteman script to debug HCFS performance (apache#1311) HDDS-4057. Failed acceptance test missing from bundle (apache#1283) HDDS-4040. [OFS] BasicRootedOzoneFileSystem to support batchDelete (apache#1286) HDDS-4061. Pending delete blocks are not always included in #BLOCKCOUNT metadata (apache#1288) HDDS-4067. Implement toString for OMTransactionInfo (apache#1300) HDDS-3878. Make OMHA serviceID optional if one (but only one) is defined in the config (apache#1149) HDDS-3833. Use Pipeline choose policy to choose pipeline from exist pipeline list (apache#1096) HDDS-3979. Make bufferSize configurable for stream copy (apache#1212) HDDS-4048. Show more information while SCM version info mismatch (apache#1278) HDDS-4078. Use HDDS InterfaceAudience/Stability annotations (apache#1302) HDDS-4034. Add Unit Test for HadoopNestedDirGenerator. (apache#1266) HDDS-4076. Translate CSI.md into Chinese (apache#1299) HDDS-4046. Extensible subcommands for CLI applications (apache#1276) HDDS-4051. Remove whitelist/blacklist terminology from Ozone (apache#1306) HDDS-4055. Cleanup GitHub workflow (apache#1282) HDDS-4042. Update documentation for the GA release (apache#1269) HDDS-4066. Add core-site.xml to intellij configuration (apache#1292) HDDS-4073. Remove leftover robot.robot (apache#1297) ...
|
/retest |
|
To re-run CI checks, please follow these steps with the source branch checked out: |
https://issues.apache.org/jira/browse/HDDS-4037