Skip to content

Conversation

@jianghuazhu
Copy link
Contributor

What changes were proposed in this pull request?

Now, the SCM web ui does not show how much storage percentage is used by each DataNode node, the purpose of this PR is to make up for it.
image

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9988

How was this patch tested?

The SCM web ui can display the storage percentage used by each DataNode node.

@jianghuazhu
Copy link
Contributor Author

ci: https://github.com/jianghuazhu/ozone/actions/runs/7348062465
New web ui:
image

Can you help review this PR, @adoroszlai @hemantk-12 .
Thanks.

@jianghuazhu
Copy link
Contributor Author

Can you help review this PR, @aswinshakil @fapifta @szetszwo .
Thanks.

@adoroszlai adoroszlai requested review from dombizita and xBis7 January 5, 2024 20:18
Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thanks for the patch. I've left a comment.

@jianghuazhu
Copy link
Contributor Author

Can you help review this pr again, @xBis7 .
Thanks.

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thanks for updating the patch. I have some suggestions for optimizing the test.

@whbing
Copy link
Contributor

whbing commented Jan 8, 2024

How about also display Capacity and ScmUsed ?
similar HDFS UI like this ?
image
IMO, it's also best to have a sort function

@jianghuazhu
Copy link
Contributor Author

jianghuazhu commented Jan 9, 2024

Thanks @whbing .
The DataNode UI already displays the total storage size and used size. Does it still need to be displayed in the SCM UI?
What do you think, @xBis7 .

@xBis7
Copy link
Contributor

xBis7 commented Jan 9, 2024

IMO, it's also best to have a sort function

There is a sort function. I don't know if it works for every column though. That needs to be checked.

The DataNode UI already displays the total storage size and used size. Does it still need to be displayed in the SCM UI?
What do you think, @xBis7 .

It can be useful to have such information in one place, but if it's out of scope then it should be done as part of another jira. Tickets with narrow scope are easier for people to review.

How about also display Capacity and ScmUsed ?

Capacity seems related and can be part of this PR but ScmUsed doesn't.

As long as the PR changes are within scope, I'm ok with them.

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thanks for making the changes. I went over the code once more.

I also tested it locally in docker. It looks good but Recon displays a different percentage than SCM UI. It could be due to the docker dev env and I don't have an available cluster at the moment, to load the patch. If you can verify that there is no issue.

Can you check the CI failure?
https://github.com/jianghuazhu/ozone/actions/runs/7461358714

@jianghuazhu
Copy link
Contributor Author

Thanks @xBis7 and @whbing .
I've updated it.
ci:
https://github.com/jianghuazhu/ozone/actions/runs/7462830986
New SCM UI:
image

Newly added columns can be sorted.

@jianghuazhu
Copy link
Contributor Author

Can you help me take a look at this PR again, @xBis7 .
Thanks.

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tested it locally in docker. It looks good but Recon displays a different percentage than SCM UI. It could be due to the docker dev env and I don't have an available cluster at the moment, to load the patch. If you can verify that there is no issue.

@jianghuazhu Can you also verify this? If it's an issue we will file a new jira.

@jianghuazhu
Copy link
Contributor Author

I found some problems.
Recon UI:
image

Here's an explanation of some important data:
Ozone Used: 1.7TB
Non Ozone Used: 1.6TB
Remaining: 127.6 TB
As can be seen here, the total storage capacity size is 130.9TB.

In Recon UI, the calculation formula of percentage is used for storage: (1.7 + 1.6) / 130.9. However, the output result here does not retain decimal places, so the final result is 3%.

SCM UI:
image

In the SCM UI, the calculation formula for storage usage percentage is: 1.7 / 130.9. In the final output result, we retain 1 decimal place, so the final result is 1.3%. It should be pointed out that Non Ozone Used is not the actual storage capacity used by Ozone.
@xBis7, what are your thoughts?

@xBis7
Copy link
Contributor

xBis7 commented Jan 11, 2024

It should be pointed out that Non Ozone Used is not the actual storage capacity used by Ozone.

Where does Non Ozone Used come from?

@jianghuazhu
Copy link
Contributor Author

In our test cluster, each node is configured with 12 disks, and each disk stores Ozone data and HDFS data respectively. For example, one of the disk directories:
/data5:
131G ./hadoopdata
167G ./ozonedata

I understand that Non Ozone Used here refers to the storage capacity used by HDFS and the data used by other programs.

@xBis7
Copy link
Contributor

xBis7 commented Jan 11, 2024

Thanks for testing it out. Capacity refers to the entire disk space, right? It's not just Ozone disk space.

If there is a lot of space from the total capacity, occupied by non Ozone files, and the SCM UI doesn't report it, then the Used Space Percentage will be misleading.

e.g.

  • We have 100 TB of disk space
  • Ozone takes up to 5TB
  • Other non Ozone files take another 2TB

In that scenario, you have

  • 7% space used
  • SCM UI reports 5%?
  • You still don't have 95% available but only 93%

Is there an easy way to include that number in the SCM UI?

@jianghuazhu
Copy link
Contributor Author

As you can see now, the percentages shown in the Recon UI represent node disk usage, not just Ozone usage, which is an extension of Ozone.
However, I believe that the storage usage shown by the SCM should be the data Used by the real Ozone datanodes, and there is no need to show 'Non Ozone Used' data.
Therefore, we can use 'SCM Used Space Percent' in the SCM UI to indicate how much Ozone data is managed by the DataNode service. The data shown here should correspond to the storage capacity of the DataNode UI.
image

@xBis7 , what do you think?

@xBis7
Copy link
Contributor

xBis7 commented Jan 11, 2024

That's why I asked about capacity. Is it referring to the entire disk or just Ozone?

@jianghuazhu
Copy link
Contributor Author

I think the percentage of storage usage here is relative to the entire disk.

@adoroszlai adoroszlai added the UI label Jan 12, 2024
@jianghuazhu
Copy link
Contributor Author

I have an idea.
In the SCM UI, the storage usage percentages of Ozone Used and Non Ozone Used are displayed respectively.
New SCM UI:
image

Here’s an explanation of some key data:
scmUsed:
This refers to Ozone Used. Calculation formula = scmUsed / capacity, and the result value retains 2 decimal places.
nonScmUsed:
This refers to Non Ozone Used.
Calculation method:
1.scmNonUsed = capacity - scmUsed - remaining
2.nonScmUsed= scmNonUsed / capacity
In Recon UI, the method of calculating Non Ozone Used is the same as the method of calculating nonScmUsed.
@xBis7 @adoroszlai , do you have any suggestions?

@xBis7
Copy link
Contributor

xBis7 commented Jan 12, 2024

@jianghuazhu The latest approach looks good. Displaying Total as well could be practical.

@jianghuazhu
Copy link
Contributor Author

@xBis7
Copy link
Contributor

xBis7 commented Jan 13, 2024

@jianghuazhu Sorry for not been clear. With Total, I was referring to the total used space. Can you rename that column back to Capacity?

Otherwise, changes look good. After that change, I'll open up the PR Ci.

@jianghuazhu
Copy link
Contributor Author

Thanks @xBis7 .
I've updated it.
Latest SCM UI:
image

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianghuazhu Thanks for the changes, LGTM!

@jianghuazhu
Copy link
Contributor Author

jianghuazhu commented Jan 18, 2024

@xBis7 @adoroszlai , do you have any new suggestions for this PR?
If possible, help merge it into the master branch.
In addition, I will submit some new features later, which are based on this PR.
Thanks.

@xBis7
Copy link
Contributor

xBis7 commented Jan 18, 2024

@jianghuazhu Let's wait to see if @adoroszlai has any comments. Otherwise, it can be merged.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jianghuazhu for working on this.

Comment on lines 1199 to 1200
usedPercentage = Double.valueOf(decimalFormat.format(usedPerc));
nonUsedPercentage = Double.valueOf(decimalFormat.format(nonUsedPerc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we format the value, then parse it as double. The caller then simply concatenates them into the final text being displayed. I think formatting may be lost this way.

I think calculateStoragePercentage() should return the formatted String instead.

If there are no reports, return "N/A", since we have no information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update soon.

double scmUsedPerc = storagePercentage[0];
double nonScmUsedPerc = storagePercentage[1];
map.put(usedSpacePercent,
"scmUsed: " + scmUsedPerc + "%, nonScmUsed: " + nonScmUsedPerc + "%");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "scmUsed" and "nonScmUsed" are clear enough for users. Also, the table header already says "Used Space Percent".

How about "Ozone: ... , other: ..."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a definite way of expressing this?
Then we display 'Ozone: ... , other: ...'?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify the question?

My suggestion is to change the value displayed in table cells:

-scmUsed: 12.34%, nonScmUsed: 1.23%
+Ozone: 12.34%, other: 1.23%

(... are just placeholders in my previous comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will update it.

Comment on lines 146 to 147
private final String usedSpacePercent = "USEDSPACEPERCENT";
private final String totalCapacity = "CAPACITY";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be constants. (The existing ones, too, but those are being fixed in HDDS-10152.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai , do we need to wait for HDDS-10152 to be resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we need to wait.

(However, I'd like to get HDDS-10157 fixed before merging anything else, since CI is failing in native check.)

Comment on lines 1631 to 1632
String capacityResult = nodeManager.calculateStorageCapacity(dni);
assertEquals(capacityResult, totalCapacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow assertEquals(expected, actual) argument order.

* @param dni DataNode node that needs to be calculated.
* @return
*/
public String calculateStorageCapacity(DatanodeInfo dni) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both new calculate methods can be static and take List<StorageReportProto> as parameter instead of DatanodeInfo.

This allows the test to be much simpler (see other comment on the test for details).

Comment on lines 1609 to 1630
OzoneConfiguration conf = getConf();
conf.setTimeDuration(OZONE_SCM_HEARTBEAT_PROCESS_INTERVAL, 1000,
MILLISECONDS);
try (SCMNodeManager nodeManager = createNodeManager(conf)) {
EventQueue eventQueue = (EventQueue) scm.getEventQueue();
LayoutVersionManager versionManager =
nodeManager.getLayoutVersionManager();
LayoutVersionProto layoutInfo = toLayoutVersionProto(
versionManager.getMetadataLayoutVersion(),
versionManager.getSoftwareLayoutVersion());
DatanodeDetails dn = MockDatanodeDetails.randomDatanodeDetails();
UUID dnId = dn.getUuid();
List<StorageReportProto> reports = volumeCount > 0 ?
generateStorageReportProto(volumeCount, dnId, perCapacity,
used, remaining) : null;
nodeManager.register(dn, reports != null ?
HddsTestUtils.createNodeReport(reports, emptyList()) : null, null);
nodeManager.processHeartbeat(dn, layoutInfo);
eventQueue.processAll(8000L);
NodeStateManager nodeStateManager = nodeManager.getNodeStateManager();
List<DatanodeInfo> dataNodeInfoList = nodeStateManager.getAllNodes();
DatanodeInfo dni = dataNodeInfoList.get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If calculate... methods are changed as suggested, all we need here is:

      List<StorageReportProto> reports = volumeCount > 0 ?
          generateStorageReportProto(volumeCount, dnId, perCapacity,
              used, remaining) : null;

@jianghuazhu
Copy link
Contributor Author

@adoroszlai , I've updated it.
Can you help review again?
ci: https://github.com/jianghuazhu/ozone/actions/runs/7583986252
SCM UI:
image

@adoroszlai adoroszlai dismissed their stale review January 19, 2024 14:19

patch updated

@adoroszlai
Copy link
Contributor

Thanks @jianghuazhu for improving the patch.

@adoroszlai
Copy link
Contributor

@dombizita would you like to take a look, too?

Copy link
Contributor

@dombizita dombizita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @jianghuazhu, it looks good to me! I also tested it locally and it looks as expected.

@jianghuazhu
Copy link
Contributor Author

Thanks @dombizita .

@adoroszlai adoroszlai merged commit 9afd1a1 into apache:master Jan 22, 2024
@adoroszlai
Copy link
Contributor

Thanks @jianghuazhu for the patch, @dombizita, @xBis7 for the review.

Tejaskriya pushed a commit to Tejaskriya/ozone that referenced this pull request Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants