Skip to content

Conversation

@ChenSammi
Copy link
Contributor

@ChenSammi ChenSammi changed the title HDDS-4037. Incorrect container numberOfKeys and usedBytes information… HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion Aug 7, 2020
@avijayanhwx
Copy link
Contributor

avijayanhwx commented Aug 7, 2020

@ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier.

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Aug 10, 2020

@ChenSammi Thanks for fixing this. Can you add a short description on what the root cause is, and how this patch fixes it? That will help make the review easier.

Sure. Currently there are two issues.

  1. Container keyCount in datanode side is incorrent. KeyCount is increased by 1 for every putblock command. This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout. With FilePerBlock layout, you will find keyCount value is higher than accutal block number.

  2. Container keyCount and bytesUsed hold in SCM side is based on the container report data from each datanode. AbstractContainerReportHandler#updateContainerStats, currently SCM only updates the in-memory Container data if the reported container data is bigger. This's true for OPEN containers. But doesn't stand for CLOSED containers.

In our cluster, there are a lot of containers, all blocks are deleted. But I can still get the keyCount and BytesUsed data through CLI "sh container info" command.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for working on this.

KeyCount is increased by 1 for every putblock command. This is true for FilePerChunk container layout while it's incorrect for FilePerBlock layout.

I don't think this problem is specific to the FilePerBlock layout. PutBlock may be called for the same block multiple times in both layouts, depending on the buffer flush and block size configuration.

Config:

ozone.client.stream.buffer.size=1MB
ozone.client.stream.buffer.flush.size=4MB
ozone.scm.chunk.size=1MB
ozone.scm.container.size=1GB

Test with Ozone 0.5.0, before FilePerBlock was introduced:

$ ozone freon ockg -n1 -t1 -s 67108864 -p 64MB
...
Successful executions: 1

$ ozone scmcli container list
{
  "state" : "OPEN",
  "replicationFactor" : "THREE",
  "replicationType" : "RATIS",
  "usedBytes" : 67108864,
  "numberOfKeys" : 16,
  "lastUsed" : 156717083,
  "stateEnterTime" : 156589360,
  "owner" : "5b4981a0-61f4-42d1-af2d-b990e9a77741",
  "containerID" : 1,
  "deleteTransactionId" : 0,
  "sequenceId" : 0,
  "open" : true
}

I have verified that this case is fixed by the patch.

However, there is another case where similar problem still exists: if key is so large that it spans multiple blocks, numberOfKeys will include each block separately.

Test with patch using default block size of 256MB (after previous 64MB test):

$ ozone freon ockg -n1 -t1 -s 536870912 -p 512MB
...
Successful executions: 1

$ ozone admin container list
{
  "state" : "OPEN",
  "replicationFactor" : "THREE",
  "replicationType" : "RATIS",
  "usedBytes" : 600834048,
  "numberOfKeys" : 3,
  "lastUsed" : "2020-08-10T08:32:43.112Z",
  "stateEnterTime" : "2020-08-10T08:22:39.855Z",
  "owner" : "f430cf45-c4ba-4271-8b3a-22bd314be60d",
  "containerID" : 1,
  "deleteTransactionId" : 0,
  "sequenceId" : 0,
  "open" : true
}

Here numberOfKeys should be 2, but it is 3 = 1 (for 64MB key) + 2 (for 512MB key).

Comment on lines 85 to 86
*/
@Ignore
public class TestBlockDeletion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #1121 for an attempt to enable this integration test.

Comment on lines 316 to 333
// check container key count and bytes used
long maxUsedBytes = 0;
long maxKeyCount = 0;
ContainerReplica[] rps = replicas.toArray(new ContainerReplica[0]);
for (int i = 0; i < rps.length; i++) {
maxUsedBytes = Math.max(maxUsedBytes, rps[i].getBytesUsed());
maxKeyCount = Math.max(maxKeyCount, rps[i].getKeyCount());
LOG.info("Replica key count {}, bytes used {}",
rps[i].getKeyCount(), rps[i].getBytesUsed());
}
if (maxKeyCount < container.getNumberOfKeys()) {
container.setNumberOfKeys(maxKeyCount);
}
if (maxUsedBytes < container.getUsedBytes()) {
container.setUsedBytes(maxUsedBytes);
}
LOG.info("Container key count {}, bytes used {}",
container.getNumberOfKeys(), container.getUsedBytes());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this block to a new method to keep the level of abstraction consistent.

LOG.info(
"Block deletion txnID mismatch in datanode {} for containerID {}."
LOG.debug(
"Block deletion txnID behinds in datanode {} for containerID {}."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Block deletion txnID behinds in datanode {} for containerID {}."
"Block deletion txnID lagging in datanode {} for containerID {}."

@ChenSammi
Copy link
Contributor Author

However, there is another case where similar problem still exists: if key is so large that it spans multiple blocks, numberOfKeys will include each block separately.

Test with patch using default block size of 256MB (after previous 64MB test):

$ ozone freon ockg -n1 -t1 -s 536870912 -p 512MB
...
Successful executions: 1

$ ozone admin container list
{
  "state" : "OPEN",
  "replicationFactor" : "THREE",
  "replicationType" : "RATIS",
  "usedBytes" : 600834048,
  "numberOfKeys" : 3,
  "lastUsed" : "2020-08-10T08:32:43.112Z",
  "stateEnterTime" : "2020-08-10T08:22:39.855Z",
  "owner" : "f430cf45-c4ba-4271-8b3a-22bd314be60d",
  "containerID" : 1,
  "deleteTransactionId" : 0,
  "sequenceId" : 0,
  "open" : true
}

Here numberOfKeys should be 2, but it is 3 = 1 (for 64MB key) + 2 (for 512MB key).

@adoroszlai thanks for the review and comments. The keyCount in each container is different from the keyCount from OM point of view. The keyCount in container is actually the block count(FilePerBlock) or the chunk count(FilePerChunk). So in the above case, 3 numberOfKeys is an expected value.

@ChenSammi
Copy link
Contributor Author

ChenSammi commented Aug 11, 2020

I don't think this problem is specific to the FilePerBlock layout. PutBlock may be called for the same block multiple times in both layouts, depending on the buffer flush and block size configuration.

Config:

ozone.client.stream.buffer.size=1MB
ozone.client.stream.buffer.flush.size=4MB
ozone.scm.chunk.size=1MB
ozone.scm.container.size=1GB

Test with Ozone 0.5.0, before FilePerBlock was introduced:

$ ozone freon ockg -n1 -t1 -s 67108864 -p 64MB
...
Successful executions: 1

$ ozone scmcli container list
{
  "state" : "OPEN",
  "replicationFactor" : "THREE",
  "replicationType" : "RATIS",
  "usedBytes" : 67108864,
  "numberOfKeys" : 16,
  "lastUsed" : 156717083,
  "stateEnterTime" : 156589360,
  "owner" : "5b4981a0-61f4-42d1-af2d-b990e9a77741",
  "containerID" : 1,
  "deleteTransactionId" : 0,
  "sequenceId" : 0,
  "open" : true
}

I have verified that this case is fixed by the patch.

@adoroszlai , here I think WriteChunk command will be called multiple times instead of putBlock command, right?

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ChenSammi for updating the patch. It would be nice if you could find a better name for the new method, but it is OK anyway.

here I think WriteChunk command will be called multiple times instead of putBlock command, right?

Not only WriteChunk, but PutBlock, too. It's like: WriteChunk, WriteChunk, ..., PutBlock, WriteChunk, WriteChunk, ..., PutBlock.

keyCount in container is actually the block count(FilePerBlock) or the chunk count(FilePerChunk)

I don't think there is such a difference in the layouts: the client does not know about the layout, and the server updates statistics while handling the PutBlock command, regardless of how the chunk is written to disk.

* Check and update Container key count and used bytes based on it's replica's
* data.
*/
private void checkAndUpdateContainerState(final ContainerInfo container,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something along the lines of updateContainerStatsFromReplicas would be more descriptive, especially since container state (closed, etc.) is not being changed.

@ChenSammi
Copy link
Contributor Author

Thanks @adoroszlai for review the code.

@ChenSammi ChenSammi merged commit 83697f9 into apache:master Aug 12, 2020
errose28 added a commit to errose28/ozone that referenced this pull request Aug 12, 2020
* master: (28 commits)
  HDDS-4037. Incorrect container numberOfKeys and usedBytes in SCM after key deletion (apache#1295)
  HDDS-3232. Include the byteman scripts in the distribution tar file (apache#1309)
  HDDS-4095. Byteman script to debug HCFS performance (apache#1311)
  HDDS-4057. Failed acceptance test missing from bundle (apache#1283)
  HDDS-4040. [OFS] BasicRootedOzoneFileSystem to support batchDelete (apache#1286)
  HDDS-4061. Pending delete blocks are not always included in #BLOCKCOUNT metadata (apache#1288)
  HDDS-4067. Implement toString for OMTransactionInfo (apache#1300)
  HDDS-3878. Make OMHA serviceID optional if one (but only one) is defined in the config (apache#1149)
  HDDS-3833. Use Pipeline choose policy to choose pipeline from exist pipeline list (apache#1096)
  HDDS-3979. Make bufferSize configurable for stream copy (apache#1212)
  HDDS-4048. Show more information while SCM version info mismatch (apache#1278)
  HDDS-4078. Use HDDS InterfaceAudience/Stability annotations (apache#1302)
  HDDS-4034. Add Unit Test for HadoopNestedDirGenerator. (apache#1266)
  HDDS-4076. Translate CSI.md into Chinese (apache#1299)
  HDDS-4046. Extensible subcommands for CLI applications (apache#1276)
  HDDS-4051. Remove whitelist/blacklist terminology from Ozone (apache#1306)
  HDDS-4055. Cleanup GitHub workflow (apache#1282)
  HDDS-4042. Update documentation for the GA release (apache#1269)
  HDDS-4066. Add core-site.xml to intellij configuration (apache#1292)
  HDDS-4073. Remove leftover robot.robot (apache#1297)
  ...
@ChenSammi
Copy link
Contributor Author

/retest

@github-actions
Copy link

To re-run CI checks, please follow these steps with the source branch checked out:

git commit --allow-empty -m 'trigger new CI check'
git push

rakeshadr pushed a commit to rakeshadr/hadoop-ozone that referenced this pull request Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants