HDDS-1234. Short summary of the change#10
Conversation
Summary of ChangesHello @priyeshkaratha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances Recon's monitoring capabilities by introducing a new API endpoint to expose pending deletion metrics from Ozone Manager, Storage Container Manager, and DataNodes. A key improvement is the implementation of an asynchronous mechanism for collecting DataNode metrics via JMX, which uses non-blocking HTTP requests to efficiently gather data without impacting system performance. The changes also include new configuration options to fine-tune this metric collection process and robust integration tests to ensure the reliability of the new functionality. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
|
No such command.
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new endpoint, /pendingDeletion, to provide metrics on pending deletions from OM, SCM, and DataNodes. A key addition is the DataNodeMetricsService, which asynchronously gathers metrics from DataNodes using a non-blocking HTTP client. The implementation is robust, leveraging CompletableFuture for concurrency and an immutable state holder for thread safety. The changes are supported by new configurations, utility methods, and a comprehensive integration test. My review has identified a potential bug in URL encoding within the new JMX service provider, a hardcoded timeout that should be made configurable, and a minor logging inconsistency in the test suite.
...ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/JmxServiceProviderImpl.java
Show resolved
Hide resolved
...-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestStorageDistributionEndpoint.java
Show resolved
Hide resolved
...ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/spi/impl/JmxServiceProviderImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Code Review
This pull request introduces a new asynchronous service in Recon to collect pending deletion metrics from DataNodes via their JMX endpoints. This is exposed through a new REST endpoint, /api/v1/pendingDeletion. The implementation leverages CompletableFuture for non-blocking I/O, which is a solid choice for performance and scalability. The changes are well-structured, including new configurations, service and API classes, modifications to the metrics provider SPI, and a comprehensive integration test. My feedback focuses on improving the immutability of state objects, enhancing the robustness of utility methods, and improving test logging for better debuggability.
...-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestStorageDistributionEndpoint.java
Show resolved
Hide resolved
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/DataNodeMetricsService.java
Show resolved
Hide resolved
| int totalNodesFailed, | ||
| long lastCollectionEndTime) { | ||
| this.status = status; | ||
| this.pendingDeletionList = pendingDeletionList; |
There was a problem hiding this comment.
The MetricsState class is intended to be an immutable state holder, but the pendingDeletionList is mutable. The list is assigned directly, which means it can be modified by consumers of this state object. To guarantee thread safety and prevent unexpected side effects, you should wrap the list in an unmodifiable view.
| this.pendingDeletionList = pendingDeletionList; | |
| this.pendingDeletionList = Collections.unmodifiableList(pendingDeletionList); |
| summary.getTotalBlockReplicatedSize(), | ||
| summary.getTotalBlockCount()); | ||
| return Response.ok(pendingDeletion).build(); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Catching the generic Exception is too broad. The method scmClient.getDeletedBlockSummary() is declared to throw IOException. It's a better practice to catch the most specific exception type possible to avoid unintentionally handling unrelated runtime exceptions.
| } catch (Exception e) { | |
| } catch (IOException e) { |
| Object value = metrics.get(keyName); | ||
| if (value instanceof Long) { | ||
| return (long) value; | ||
| } | ||
| if (value instanceof Integer) { | ||
| Integer intValue = (Integer) value; | ||
| return intValue.longValue(); | ||
| } | ||
| return -1; |
There was a problem hiding this comment.
The extractMetricValue method currently handles only Long and Integer types. JMX metric values can be other Number subtypes or even strings. To make this utility more robust, it should be updated to handle any Number type and also attempt to parse String values.
Object value = metrics.get(keyName);
if (value instanceof Number) {
return ((Number) value).longValue();
}
if (value instanceof String) {
try {
return Long.parseLong((String) value);
} catch (NumberFormatException e) {
// Not a number string, fall through to return -1
}
}
return -1;| if (jmxEndpoint != null && jmxEndpoint.endsWith("/")) { | ||
| jmxEndpoint = jmxEndpoint.substring(0, jmxEndpoint.length() - 1); | ||
| } |
There was a problem hiding this comment.
What changes were proposed in this pull request?
Provide a one-liner summary of the changes in the PR Title field above.
It should be in the form of
HDDS-1234. Short summary of the change.Please describe your PR in detail:
perspective not just for the reviewer.
the Jira's description if the jira is well defined.
issue investigation, github discussion, etc.
Examples of well-written pull requests:
What is the link to the Apache JIRA
Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull
request which starts with the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)
(Please replace this section with the link to the Apache JIRA)
How was this patch tested?
(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)