Skip to content

HDDS-7370. Add pending commands in SCM to Datanode command count#3867

Merged
sodonnel merged 2 commits intoapache:masterfrom
sodonnel:HDDS-7370
Oct 27, 2022
Merged

HDDS-7370. Add pending commands in SCM to Datanode command count#3867
sodonnel merged 2 commits intoapache:masterfrom
sodonnel:HDDS-7370

Conversation

@sodonnel
Copy link
Contributor

What changes were proposed in this pull request?

In HDDS-6567, we stored the datanode queued command counts into DatanodeInfo on each heartbeat. However, we did not consider the commands that were queued to goto the Datanode on that heartbeat. The real count, is the count queued on the DNs, plus the commands queued in SCM we are about to send to the DN over the heartbeat being processed.

This Jira adds the "commands ready to send" count to the stored counts to give a better reflection on the current queued count on the DN.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7370

How was this patch tested?

Adapted existing tests

Assertions.assertEquals(commandReport.getReport().getCount(i),
storedCount);
int expectedCount = commandReport.getReport().getCount(i);
// For the first two commands, we added extra commands
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Stephen! Just for my self-learning, why the first two commands we add extra commands~? Thanks~ 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to test the new part of the code - that the queued commands in the list are added to the totals in the report count. I did not add extra commands to all of them, so I could see the others work ok when there isn't extra commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it! thanks Stephen!

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @sodonnel .Overall the change looks good to me. Just have one question inline.

lock.writeLock().lock();
// Purge the existing counts, as each report should completely replace
// the existing counts.
commandCounts.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick question, why was this not done before and done now? I understand we would like to keep the current count and replace the stale counters. I thought it was already done by commandCounts.put(command, cmdCount) and also now with commandCounts.putAll(mutableCmds).

Copy link
Contributor Author

@sodonnel sodonnel Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow your question.

There are two counts. The ones reported from the DN on its heartbeart - these are queued on the DN.

Then there are commands queued in SCM ready to be sent to the DN as part of processing the heartbeat.

After the heartbeat, the real queue on the DN is the sum of the two (queued on DN + queued on SCM).

Before this change, we only stored the "queued on the DN count". This count is not yet used anywhere, but I missed adding the commands the heartbeat will add to the queue, so the old count is not correct.

The line commandCounts.put(command, cmdCount) should do all the updates as per the comment in the code. The line commandCounts.putAll(mutableCmds) is a "catch all" incase the DN is not reporting counts for a command that is queued on SCM - it shouldn't happen in practice. I have also removed the entry from mutableCommands after adding it to the cmdCount, so in practice mutableCommands should be empty by the time it gets to the putAll() call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. My initial doubt was commandCounts.clear(); was added now in this patch, I was just wondering why wasn't it added before. I guess the initial thought was that the command not reported by DN should be 0, it would be cleared anyway (with 0) by the commandCounts.put(command, cmdCount).

The changes look good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea not sure why I didn't have that before. I guess we could omit it, but I should not do any harm either.

// there is a command queued for. The DNs should return a count for all
// command types even if they have a zero count, so this is really to
// handle something being wrong on the DN where it sends a spare report.
// It really should never happe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It really should never happe.
// It really should never happen.

Copy link
Contributor

@umamaheswararao umamaheswararao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks @sodonnel for working on this.
Thanks @aswinshakil and @DaveTeng0 for reviews!

@sodonnel sodonnel merged commit 880f87c into apache:master Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants