Skip to content

Conversation

@slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Oct 3, 2024

What changes were proposed in this pull request?

Currently, we lack tools on the SCM side to track failed disks on DataNodes. DataNodes have already reported this information, and we need to display it.

In this PR, we will display the failed disks on the DataNode. The information can be displayed in JSON format or using the default format.

Default format

Datanode Volume Failures (5 Volumes)

Node         : localhost-62.238.104.185 (de97aaf3-99ad-449d-ad92-2c4f5a744b49) 
Failed Volume: /data0/ozonedata/hdds 
Capacity Lost: 7430477791683 B (6.76 TB) 
Failure Date : Thu Oct 03 09:25:16 +0800 2024 

Node         : localhost-163.120.165.68 (cf40e987-8952-4f7a-88b7-096e6b285243) 
Failed Volume: /data1/ozonedata/hdds 
Capacity Lost: 7430477791683 B (6.76 TB) 
Failure Date : Thu Oct 03 09:25:16 +0800 2024 

Node         : localhost-253.243.206.120 (0cc77921-489d-4cf0-a036-475faa16d443) 
Failed Volume: /data2/ozonedata/hdds 
Capacity Lost: 7430477791683 B (6.76 TB) 
Failure Date : Thu Oct 03 09:25:16 +0800 2024 

Node         : localhost-136.194.243.81 (5cb6430d-0ce5-4204-b265-179ee38fb30e) 
Failed Volume: /data3/ozonedata/hdds 
Capacity Lost: 7430477791683 B (6.76 TB) 
Failure Date : Thu Oct 03 09:25:16 +0800 2024 

Node         : localhost-48.253.209.226 (f99a8374-edb0-419d-9cba-cfab9d9e8a2e) 
Failed Volume: /data4/ozonedata/hdds 
Capacity Lost: 7430477791683 B (6.76 TB) 
Failure Date : Thu Oct 03 09:25:16 +0800 2024 

Json format

[ {
  "node" : "localhost-161.170.151.131 (155bb574-7ed8-41cd-a868-815f4c2b0d60)",
  "volumeName" : "/data0/ozonedata/hdds",
  "failureDate" : 1727918794694,
  "capacityLost" : 7430477791683
}, {
  "node" : "localhost-67.218.46.23 (520d29eb-8387-4cda-bcb1-8727fdddd451)",
  "volumeName" : "/data1/ozonedata/hdds",
  "failureDate" : 1727918794695,
  "capacityLost" : 7430477791683
}, {
  "node" : "localhost-30.151.88.21 (d66cab50-bbf8-4199-9d7f-82da84a30137)",
  "volumeName" : "/data2/ozonedata/hdds",
  "failureDate" : 1727918794695,
  "capacityLost" : 7430477791683
}, {
  "node" : "localhost-78.50.38.217 (a673f50a-6f74-4e62-8c0c-f7337d5f3ce5)",
  "volumeName" : "/data3/ozonedata/hdds",
  "failureDate" : 1727918794695,
  "capacityLost" : 7430477791683
}, {
  "node" : "localhost-138.205.52.25 (84b7e49a-9bd4-4115-96fa-69f2d259343c)",
  "volumeName" : "/data4/ozonedata/hdds",
  "failureDate" : 1727918794695,
  "capacityLost" : 7430477791683
} ]

Table format

+-------------------------------------------------------------------------------------------------------------------------------------------+
|                                                         Datanode Volume Failures                                                          |
+------------------------------------------------------------------+-----------------------+---------------+--------------------------------+
|                               Node                               |      Volume Name      | Capacity Lost |          Failure Date          |
+------------------------------------------------------------------+-----------------------+---------------+--------------------------------+
|  localhost-83.212.219.28 (8b6addb1-759a-49e9-99fb-0d1a6cfb2d7f)  | /data0/ozonedata/hdds |    6.76 TB    | Sat Oct 05 17:47:47 +0800 2024 |
| localhost-103.199.236.47 (0dbe503a-3382-4753-b95a-447bab5766c4)  | /data1/ozonedata/hdds |    6.76 TB    | Sat Oct 05 17:47:47 +0800 2024 |
|  localhost-178.123.46.32 (2017076a-e763-4f47-abce-78535b5770a3)  | /data2/ozonedata/hdds |    6.76 TB    | Sat Oct 05 17:47:47 +0800 2024 |
| localhost-123.112.235.228 (aaebb6a7-6b62-4160-9934-b16b8fdde65e) | /data3/ozonedata/hdds |    6.76 TB    | Sat Oct 05 17:47:47 +0800 2024 |
| localhost-249.235.216.19 (cbc7c0b5-5ae0-4e40-91b8-1d9c419a007c)  | /data4/ozonedata/hdds |    6.76 TB    | Sat Oct 05 17:47:47 +0800 2024 |
+------------------------------------------------------------------+-----------------------+---------------+--------------------------------+

What is the link to the Apache JIRA

JIRA: HDDS-11463. Track and display failed DataNode storage locations in SCM.

How was this patch tested?

Add Junit Test & Testing in a test environment.

@slfan1989 slfan1989 marked this pull request as ready for review October 5, 2024 11:48
@slfan1989 slfan1989 marked this pull request as draft October 6, 2024 01:46
@slfan1989 slfan1989 closed this Oct 22, 2024
@slfan1989 slfan1989 reopened this Oct 23, 2024
@slfan1989 slfan1989 marked this pull request as ready for review October 23, 2024 04:41
@slfan1989
Copy link
Contributor Author

@errose28 Could you please help review this PR? Thank you very much! We discussed the relevant implementation together in HDDS-11463.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @slfan1989, this looks like a useful addition. I only had time for a quick high level look for now.

* Handler of ozone admin scm volumesfailure command.
*/
@Command(
name = "volumesfailure",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CLI, we should probably use something like ozone admin datanode volume list. The datanode subcommand is already used to retrieve information about datanodes from SCM. Splitting the commands so that volume has its own subcommand gives us more options in the future.

To distinguish failed and healthy volumes and filter out different nodes, we can either add some kind of filter flag, or leave it up to grep/jq to be applied to the output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means we should make the RPC more generic to support pulling all volume information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for helping to review this PR! I will continue to improve the relevant code based on your suggestions.

private boolean failedVolume = false;
private String datanodeUuid;
private String clusterID;
private long failureDate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use failureTime. I'm assuming this is being stored as millis since epoch, so it will have data and time information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have improved the relevant code.

// Ensure it is set only once,
// which is the time when the failure was first detected.
if (failureDate == 0L) {
setFailureDate(Time.now());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use Instant.now() per HDDS-7911.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@errose28 Can you help review this PR again? Thank you very much!

@adoroszlai adoroszlai marked this pull request as draft November 5, 2024 18:30
@adoroszlai
Copy link
Contributor

Thanks @slfan1989 for working on this. Converted it to draft because there is a failing test:

[ERROR] org.apache.hadoop.hdds.scm.node.TestSCMNodeManager.tesVolumeInfoFromNodeReport  Time elapsed: 1.105 s  <<< ERROR!
java.lang.UnsupportedOperationException
	at java.base/java.util.AbstractList.add(AbstractList.java:153)
	at java.base/java.util.AbstractList.add(AbstractList.java:111)
	at org.apache.hadoop.hdds.scm.node.DatanodeInfo.updateStorageReports(DatanodeInfo.java:186)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.processNodeReport(SCMNodeManager.java:674)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:423)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:360)
	at org.apache.hadoop.hdds.scm.node.TestSCMNodeManager.tesVolumeInfoFromNodeReport(TestSCMNodeManager.java:1591)

https://github.com/slfan1989/ozone/actions/runs/11471452180
https://github.com/slfan1989/ozone/actions/runs/11476535807
https://github.com/slfan1989/ozone/actions/runs/11625983369

@slfan1989 slfan1989 force-pushed the HDDS-11463 branch 2 times, most recently from a83a8f7 to b1df492 Compare November 6, 2024 09:26
@slfan1989
Copy link
Contributor Author

Thanks @slfan1989 for working on this. Converted it to draft because there is a failing test:

[ERROR] org.apache.hadoop.hdds.scm.node.TestSCMNodeManager.tesVolumeInfoFromNodeReport  Time elapsed: 1.105 s  <<< ERROR!
java.lang.UnsupportedOperationException
	at java.base/java.util.AbstractList.add(AbstractList.java:153)
	at java.base/java.util.AbstractList.add(AbstractList.java:111)
	at org.apache.hadoop.hdds.scm.node.DatanodeInfo.updateStorageReports(DatanodeInfo.java:186)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.processNodeReport(SCMNodeManager.java:674)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:423)
	at org.apache.hadoop.hdds.scm.node.SCMNodeManager.register(SCMNodeManager.java:360)
	at org.apache.hadoop.hdds.scm.node.TestSCMNodeManager.tesVolumeInfoFromNodeReport(TestSCMNodeManager.java:1591)

https://github.com/slfan1989/ozone/actions/runs/11471452180 https://github.com/slfan1989/ozone/actions/runs/11476535807 https://github.com/slfan1989/ozone/actions/runs/11625983369

@adoroszlai Thank you for reviewing this PR! I am currently making improvements, and once the changes pass the CI tests in my branch, I will reopen the PR.

cc: @errose28

@slfan1989 slfan1989 force-pushed the HDDS-11463 branch 2 times, most recently from 8bc7ae0 to ff43fac Compare November 7, 2024 08:37
@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you for reviewing this PR! I will also pay closer attention to CI issues in future development. I understand that CI testing resources are valuable.

I have made improvements to the code based on @errose28 suggestions and also fixed the related unit test errors. The CI for my branch has passed(https://github.com/slfan1989/ozone/actions/runs/11719380711), and I have updated the PR status to 'Ready for Review'.

@slfan1989 slfan1989 marked this pull request as ready for review November 7, 2024 23:48
@adoroszlai adoroszlai requested a review from errose28 November 8, 2024 04:59
@slfan1989
Copy link
Contributor Author

@errose28 Could you please help review this PR again? Thank you very much! I’ve made some additional improvements to this PR, as we wanted to print all the disk information. However, since there’s quite a lot of disk data, I’ve added pagination functionality.

@adoroszlai
Copy link
Contributor

Temporarily converted to draft and assigned to myself, to resolve conflicts.

@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you for your attention to this PR. I will continue to follow up on it.

@adoroszlai adoroszlai removed their assignment Feb 18, 2025
@adoroszlai
Copy link
Contributor

Merged from master. There will be one checkstyle problem:

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/datanode/package-info.java
 18: Missing a Javadoc comment.
 18: Missing javadoc for package-info.java file.

Previously the license header was a javadoc in this new file, so the problem was hidden.

@slfan1989
Copy link
Contributor Author

@adoroszlai Could you help review this PR? Thank you very much!

I have conducted tests on my own branch, and it currently passes the key CI tests.

https://github.com/slfan1989/ozone/actions/runs/15209585380

@adoroszlai adoroszlai self-requested a review May 23, 2025 15:03

// If startItem is specified, find its position in the volumeInfos list
int startIndex = 0;
if (StringUtils.isNotBlank(startItem)) {
Copy link
Contributor Author

@slfan1989 slfan1989 May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai I added logic to skip startItem in this part of the code, but after thinking it through, I realized it’s better to use the server’s hostname or UUID as startItem instead of a disk prefix. That’s because many machines name their disks like data0 to data9, and using a disk name could lead to unexpected filtering behavior.

@slfan1989
Copy link
Contributor Author

@adoroszlai Can we move forward with this PR? I would appreciate your advice.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slfan1989 for updating the patch.

Comment on lines 44 to 52
private static final Codec<VolumeInfo> CODEC = new DelegatedCodec<>(
Proto2Codec.get(HddsProtos.VolumeInfoProto.getDefaultInstance()),
VolumeInfo::fromProtobuf,
VolumeInfo::getProtobuf,
VolumeInfo.class);

public static Codec<VolumeInfo> getCodec() {
return CODEC;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codec is required only for storing in DB, but VolumeInfo does not seem to be persisted by either datanode or SCM. So I think this can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the CODEC.

}

message VolumeInfoProto {
optional string uuid = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use DatanodeIDProto.

*/
public final class VolumeInfo implements Comparable<VolumeInfo> {

private String uuid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use DatanodeID.


@Override
public int compareTo(VolumeInfo that) {
Preconditions.checkNotNull(that);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer builtin Objects.requireNonNull

* @throws IOException
* I/O exceptions that may occur during the process of querying the volume.
*/
StorageContainerLocationProtocolProtos.GetVolumeInfosResponseProto getVolumeInfos(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please import GetVolumeInfosResponseProto instead of StorageContainerLocationProtocolProtos.

private String uuid;

// The HostName identifier of the DataNode.
@Option(names = { "--hostName" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still applies to the latest patch.

@CommandLine.Mixin
private ListPaginationOptions listOptions;

enum DISPLAYMODE { all, normal, failed }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Enums should be named like other types (classes, interfaces): DisplayMode.

Also, please consider using all-caps for values (ALL, etc.)

Comment on lines 63 to 67
@Option(names = { "--displayMode" },
defaultValue = "all",
description = "Display mode for disks: 'failed' shows failed disks, " +
"'normal' shows healthy disks, 'all' shows all disks.")
private DISPLAYMODE displayMode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another look, I think "display mode" is confusing. JSON and Table are display modes. "all/normal/failed" filter the list by volume state.

I suggest renaming to --state, and renaming normal to healthy.

Then description can be simplified to Filter disks by state.


// If displayed in JSON format.
if (json) {
System.out.print(JsonUtils.toJsonStringWithDefaultPrettyPrinter(volumeInfos));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still applicable.

Also, please use println, to avoid situation like HDDS-13100.

Comment on lines 57 to 60
@AfterEach
public void tearDown() {
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary

@errose28
Copy link
Contributor

Hi @slfan1989 thanks for working on this change. I think there are three attributes being added here which should be reviewed separately:

  1. Adding an SCM RPC to retrieve volume information
  2. Tracking failure time of the volume
  3. Adding a CLI to view the volume information

The RPC to retrieve volume information is definitely required going forward regardless of the other two items to create some sort of CLI to query volume state.

Tracking the failure time of the volume seems like a somewhat invasive change since it spans the datanode, heartbeat, and SCM. Is this necessary, or is it enough to depend on a metrics database to track timing of cluster events? Of course we need improvements to our volume metrics as well as mentioned in #8405.

On the CLI front, I do think we need a dedicated ozone admin datanode info command going forward as outlined in HDDS-13097. This would give all volume information per node. With volume counters added to ozone admin datanode list as proposed in HDDS-13096, we could get all failed volumes in a two step process:

  1. jq filter on ozone admin datanode list to find all nodes with failed volumes.
  2. jq filter on ozone admin datanode info to get specific information about the failed volumes, including their capacity.

Do we need a dedicated ozone admin datanode volume list/info command pairing in addition to this? It may be useful to have such cross-cutting commands to get information in one shot, but on the other hand it may result in duplication at the CLI. For example I could see the request to add node filtering to ozone admin datanode volume list/info at which point it becomes much the same as ozone admin datanode list/info.

@slfan1989
Copy link
Contributor Author

@errose28 Thank you for your message! I'd like to share some thoughts from a different perspective. As it stands, this feature does not conflict with the proposal in #8405. #8405 represents a more innovative and forward-looking design, and although it's still under discussion, it will certainly be valuable if implemented as planned.

At the same time, I believe this feature does not impact HDDS-13096 or HDDS-13097. My comment on #8405 was more about expressing expectations for the system’s future capabilities — I hope Ozone can gradually support such features — rather than raising any objections to #8405 itself.

The design of #7266 is inspired by HDFS's disk failure detection mechanism, with the goal of improving the system's ability to identify and locate failed disks. For users migrating from HDFS to Ozone, using the volume command to directly view failed disks can offer a more intuitive and convenient operational experience.

From my perspective, we all play different roles in this project. Your team focuses on evolving and optimizing the system's architecture, while we, as external users, are more focused on refining specific functional details based on real-world use. Ultimately, however, we share the same goal: to make Ozone more robust, more user-friendly, and more widely adopted.

Naturally, it's not easy to fully align these detail-oriented changes with larger, ongoing feature developments — for example, making #7266 fully consistent with #8405. This is mainly because #8405 is broader in scope, with a longer timeline, whereas #7266 focuses on a very specific aspect. While we fully respect the overall direction, we also hope to move forward with some smaller, incremental improvements to address current practical issues.

In addition to this PR, we're also working on several other enhancements. For instance, we've implemented mechanisms to collect DataNode I/O statistics to more precisely manage container replication. We've also introduced time-based peak/off-peak control logic for various DataNode management operations (such as deletion, container replication, and EC container reconstruction). These improvements are driven by real-world production needs, and from our perspective, they've shown positive results.

However, since many of these PRs have some degree of code coupling with our previous contributions, it's difficult for us to combine everything into a single, unified patch for upstream submission.

Therefore, we hope to proceed with #7266 for now. If #8405 later results in a more complete or improved solution, we’d be happy to continue refining things in that direction. In the meantime, this also gives us a valuable opportunity to participate in the community and contribute to Ozone’s development.

@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you very much for reviewing the code. I will make improvements based on your suggestions. @errose28's comments are essentially not in conflict with #7266, and I'm looking forward to seeing #7266 progress so that we can move forward with the subsequent work.

@slfan1989
Copy link
Contributor Author

@adoroszlai Thank you very much for your detailed suggestions! I've made the changes accordingly. Could you review this PR again? Thank you very much! I respect @errose28's perspective. However, I believe this PR does not conflict with #8405, nor with HDDS-13096 or HDDS-13097 — they can coexist. We've already spent considerable time reviewing this PR together, and I'd like to continue moving it forward.

@slfan1989
Copy link
Contributor Author

@adoroszlai @errose28 Can I still continue to follow up on this PR? I feel that I’ve put in some effort, but right now I’ve lost a clear direction on how to proceed. According to @errose28 suggestion, this PR only needs to keep the RPC part, but I’m not sure how to continue working on the related functionality from here.

@adoroszlai
Copy link
Contributor

@slfan1989 Thanks for all your efforts on this PR. The concerns/suggestions raised by @errose28 make sense though. Please try to reach agreement.

I won't be able to re-review until next week in any case.

@slfan1989
Copy link
Contributor Author

@slfan1989 Thanks for all your efforts on this PR. The concerns/suggestions raised by @errose28 make sense though. Please try to reach agreement.

I won't be able to re-review until next week in any case.

@adoroszlai Thank you very much for your message and for your continued support and assistance! Since @errose28 is currently planning some new features, I believe this PR could be considered as part of that effort, especially given the amount of work we've already invested. As for which specific features should be retained, it would be helpful if @errose28 could review and provide guidance.

@errose28
Copy link
Contributor

errose28 commented Jun 4, 2025

Hi @slfan1989 I appreciate your response and the work you've done on these changes. My suggestion to split this PR down was not meant to diminish any of the work that has been put in here, but to speed up in incorporating the work into Ozone.

A change like this approaching 1k lines, 70+ review comments, and encompassing multiple items is going to be large for any reviewer, and I think we could make faster progress by splitting it. For example, I think we could iterate on the volume info RPC in SCM pretty quickly and get that change merged first. I'm ok with adding volume failure time and capacity lost to SCM as well, but it will be easier to review those as their own change. This way most of the work here can be merged while we discuss the CLI.

The design of #7266 is inspired by HDFS's disk failure detection mechanism, with the goal of improving the system's ability to identify and locate failed disks. For users migrating from HDFS to Ozone, using the volume command to directly view failed disks can offer a more intuitive and convenient operational experience.

Can you add more details about how this is similar to HDFS? I'm familiar with hdfs dfsadmin -report to list failed volumes, but that breaks the information down at the node level with failed volume counters for each node, which seems more similar to an ozone admin datanode list command.

I believe this feature does not impact HDDS-13096 or HDDS-13097.

Yes we could have both ozone admin datanode list/info and ozone admin datanode volume list without code conflicts, but we need to build a maintainable and intuitive CLI, which means we should avoid commands that do the same or similar things. In this case I think we should standardize one way to get volume information from the CLI. I propose keeping this within the datanode info command because volumes are completely contained within a node, unlike cross cutting concepts like containers and pipelines which have their own subcommands. Based on early comments like this I think we would end up needing node filtering in ozone admin datanode volume list at which point it becomes very similar to ozone admin datanode list/info.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Jun 17, 2025

@errose28 @adoroszlai Thank you for your message! I’m currently reviewing the issues you raised and will continue to follow up on the PR, making necessary adjustments and providing timely feedback. I'm currently working on the code improvements and expect to complete them within 1–2 days.

@slfan1989 slfan1989 changed the title HDDS-11463. Track and display failed DataNode storage locations in SCM. HDDS-11463. Add SCM RPC support for DataNode volume info reporting. Jun 29, 2025
@slfan1989
Copy link
Contributor Author

@errose28 @adoroszlai I have completed the improvements for this PR and kept the RPC interface part. Could you please help review it? Thank you very much!

@errose28
Copy link
Contributor

Sorry for the delay. Can we have this PR be just the SCM <-> client communication for querying volume info? Right now it also contains information for SCM <-> DN communication about the failure time of the volume, which is not directly related and can be added in a follow-up change. The new RPC will also need tests added.

@slfan1989
Copy link
Contributor Author

Sorry for the delay. Can we have this PR be just the SCM <-> client communication for querying volume info? Right now it also contains information for SCM <-> DN communication about the failure time of the volume, which is not directly related and can be added in a follow-up change. The new RPC will also need tests added.

@errose28 Thank you for your feedback. I will improve this PR based on your suggestions.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Aug 11, 2025

Sorry for the delay. Can we have this PR be just the SCM <-> client communication for querying volume info? Right now it also contains information for SCM <-> DN communication about the failure time of the volume, which is not directly related and can be added in a follow-up change. The new RPC will also need tests added.

@errose28 Thank you for your feedback. I will improve this PR based on your suggestions.

@errose28 @adoroszlai Could you please take another look at this PR? Thank you very much! I’ve simplified it to only include the interactions between the Client and SCM. If you think the implementation meets expectations, I’ll add some unit tests.

Apologies for the delay in following up on some Ozone PRs. Most of my time this year has been dedicated to supporting Hadoop on JDK17. That work is now nearing completion, and I will focus on addressing the remaining PRs in Ozone.

@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 12, 2025
@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

@github-actions github-actions bot closed this Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants