HDDS-7370. Add pending commands in SCM to Datanode command count#3867
HDDS-7370. Add pending commands in SCM to Datanode command count#3867sodonnel merged 2 commits intoapache:masterfrom
Conversation
| Assertions.assertEquals(commandReport.getReport().getCount(i), | ||
| storedCount); | ||
| int expectedCount = commandReport.getReport().getCount(i); | ||
| // For the first two commands, we added extra commands |
There was a problem hiding this comment.
Hello Stephen! Just for my self-learning, why the first two commands we add extra commands~? Thanks~ 🙏
There was a problem hiding this comment.
This is just to test the new part of the code - that the queued commands in the list are added to the totals in the report count. I did not add extra commands to all of them, so I could see the others work ok when there isn't extra commands.
aswinshakil
left a comment
There was a problem hiding this comment.
Thanks for working on this @sodonnel .Overall the change looks good to me. Just have one question inline.
| lock.writeLock().lock(); | ||
| // Purge the existing counts, as each report should completely replace | ||
| // the existing counts. | ||
| commandCounts.clear(); |
There was a problem hiding this comment.
Just a quick question, why was this not done before and done now? I understand we would like to keep the current count and replace the stale counters. I thought it was already done by commandCounts.put(command, cmdCount) and also now with commandCounts.putAll(mutableCmds).
There was a problem hiding this comment.
I'm not sure I follow your question.
There are two counts. The ones reported from the DN on its heartbeart - these are queued on the DN.
Then there are commands queued in SCM ready to be sent to the DN as part of processing the heartbeat.
After the heartbeat, the real queue on the DN is the sum of the two (queued on DN + queued on SCM).
Before this change, we only stored the "queued on the DN count". This count is not yet used anywhere, but I missed adding the commands the heartbeat will add to the queue, so the old count is not correct.
The line commandCounts.put(command, cmdCount) should do all the updates as per the comment in the code. The line commandCounts.putAll(mutableCmds) is a "catch all" incase the DN is not reporting counts for a command that is queued on SCM - it shouldn't happen in practice. I have also removed the entry from mutableCommands after adding it to the cmdCount, so in practice mutableCommands should be empty by the time it gets to the putAll() call.
There was a problem hiding this comment.
Thanks for the detailed explanation. My initial doubt was commandCounts.clear(); was added now in this patch, I was just wondering why wasn't it added before. I guess the initial thought was that the command not reported by DN should be 0, it would be cleared anyway (with 0) by the commandCounts.put(command, cmdCount).
The changes look good to me.
There was a problem hiding this comment.
Yea not sure why I didn't have that before. I guess we could omit it, but I should not do any harm either.
| // there is a command queued for. The DNs should return a count for all | ||
| // command types even if they have a zero count, so this is really to | ||
| // handle something being wrong on the DN where it sends a spare report. | ||
| // It really should never happe. |
There was a problem hiding this comment.
| // It really should never happe. | |
| // It really should never happen. |
umamaheswararao
left a comment
There was a problem hiding this comment.
LGTM.
Thanks @sodonnel for working on this.
Thanks @aswinshakil and @DaveTeng0 for reviews!
What changes were proposed in this pull request?
In HDDS-6567, we stored the datanode queued command counts into DatanodeInfo on each heartbeat. However, we did not consider the commands that were queued to goto the Datanode on that heartbeat. The real count, is the count queued on the DNs, plus the commands queued in SCM we are about to send to the DN over the heartbeat being processed.
This Jira adds the "commands ready to send" count to the stored counts to give a better reflection on the current queued count on the DN.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7370
How was this patch tested?
Adapted existing tests