Skip to content

HDDS-1234. Short summary of the change#10

Closed
priyeshkaratha wants to merge 13 commits intomasterfrom
HDDS-14010_DeletionEndpoint_b
Closed

HDDS-1234. Short summary of the change#10
priyeshkaratha wants to merge 13 commits intomasterfrom
HDDS-14010_DeletionEndpoint_b

Conversation

@priyeshkaratha
Copy link
Owner

What changes were proposed in this pull request?

Provide a one-liner summary of the changes in the PR Title field above.
It should be in the form of HDDS-1234. Short summary of the change.

Please describe your PR in detail:

  • What changes are proposed in the PR? and Why? It would be better if it is written from third person's
    perspective not just for the reviewer.
  • Provide as much context and rationale for the pull request as possible. It could be copy-paste from
    the Jira's description if the jira is well defined.
  • If it is complex code, describe the approach used to solve the issue. If possible attach design doc,
    issue investigation, github discussion, etc.

Examples of well-written pull requests:

What is the link to the Apache JIRA

Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull
request which starts with the corresponding JIRA issue number. (e.g. HDDS-XXXX. Fix a typo in YYY.)

(Please replace this section with the link to the Apache JIRA)

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@gemini-code-assist
Copy link

Summary of Changes

Hello @priyeshkaratha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances Recon's monitoring capabilities by introducing a new API endpoint to expose pending deletion metrics from Ozone Manager, Storage Container Manager, and DataNodes. A key improvement is the implementation of an asynchronous mechanism for collecting DataNode metrics via JMX, which uses non-blocking HTTP requests to efficiently gather data without impacting system performance. The changes also include new configuration options to fine-tune this metric collection process and robust integration tests to ensure the reliability of the new functionality.

Highlights

  • New Recon API Endpoint: A new REST API endpoint, '/pendingDeletion', has been added to Recon to provide insights into pending deletion metrics across different components (OM, SCM, and DataNodes).
  • Asynchronous DataNode Metrics Collection: Introduced a 'DataNodeMetricsService' that asynchronously collects pending deletion metrics from DataNodes using JMX, improving efficiency and responsiveness by avoiding blocking calls.
  • JMX Service Provider: A new 'JmxServiceProviderImpl' has been added to the 'MetricsServiceProvider' interface, enabling asynchronous JMX metric collection with an underlying non-blocking HTTP client.
  • New Configuration Properties: Added new configuration properties 'ozone.recon.dn.metrics.collection.thread.count' and 'ozone.recon.dn.metrics.collection.minimum.api.delay' to control the behavior and rate limiting of DataNode metrics collection.
  • Enhanced Integration Tests: The 'TestStorageDistributionEndpoint' integration test has been significantly updated to include comprehensive tests for the new '/pendingDeletion' endpoint, covering scenarios for OM, SCM, and DataNode metrics, including error handling for DataNode failures.
  • New Dependencies: Added 'httpasyncclient', 'httpclient', 'httpcore', and 'httpcore-nio' dependencies to the 'hadoop-ozone/recon/pom.xml' for supporting asynchronous HTTP requests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@priyeshkaratha
Copy link
Owner Author

/gemini review

@github-actions
Copy link

No such command. gemini Available commands:

  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /retest : provide help on how to trigger new CI build

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new endpoint, /pendingDeletion, to provide metrics on pending deletions from OM, SCM, and DataNodes. A key addition is the DataNodeMetricsService, which asynchronously gathers metrics from DataNodes using a non-blocking HTTP client. The implementation is robust, leveraging CompletableFuture for concurrency and an immutable state holder for thread safety. The changes are supported by new configurations, utility methods, and a comprehensive integration test. My review has identified a potential bug in URL encoding within the new JMX service provider, a hardcoded timeout that should be made configurable, and a minor logging inconsistency in the test suite.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new asynchronous service in Recon to collect pending deletion metrics from DataNodes via their JMX endpoints. This is exposed through a new REST endpoint, /api/v1/pendingDeletion. The implementation leverages CompletableFuture for non-blocking I/O, which is a solid choice for performance and scalability. The changes are well-structured, including new configurations, service and API classes, modifications to the metrics provider SPI, and a comprehensive integration test. My feedback focuses on improving the immutability of state objects, enhancing the robustness of utility methods, and improving test logging for better debuggability.

int totalNodesFailed,
long lastCollectionEndTime) {
this.status = status;
this.pendingDeletionList = pendingDeletionList;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MetricsState class is intended to be an immutable state holder, but the pendingDeletionList is mutable. The list is assigned directly, which means it can be modified by consumers of this state object. To guarantee thread safety and prevent unexpected side effects, you should wrap the list in an unmodifiable view.

Suggested change
this.pendingDeletionList = pendingDeletionList;
this.pendingDeletionList = Collections.unmodifiableList(pendingDeletionList);

summary.getTotalBlockReplicatedSize(),
summary.getTotalBlockCount());
return Response.ok(pendingDeletion).build();
} catch (Exception e) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching the generic Exception is too broad. The method scmClient.getDeletedBlockSummary() is declared to throw IOException. It's a better practice to catch the most specific exception type possible to avoid unintentionally handling unrelated runtime exceptions.

Suggested change
} catch (Exception e) {
} catch (IOException e) {

Comment on lines +867 to +875
Object value = metrics.get(keyName);
if (value instanceof Long) {
return (long) value;
}
if (value instanceof Integer) {
Integer intValue = (Integer) value;
return intValue.longValue();
}
return -1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The extractMetricValue method currently handles only Long and Integer types. JMX metric values can be other Number subtypes or even strings. To make this utility more robust, it should be updated to handle any Number type and also attempt to parse String values.

    Object value = metrics.get(keyName);
    if (value instanceof Number) {
      return ((Number) value).longValue();
    }
    if (value instanceof String) {
      try {
        return Long.parseLong((String) value);
      } catch (NumberFormatException e) {
        // Not a number string, fall through to return -1
      }
    }
    return -1;

Comment on lines +135 to +137
if (jmxEndpoint != null && jmxEndpoint.endsWith("/")) {
jmxEndpoint = jmxEndpoint.substring(0, jmxEndpoint.length() - 1);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for removing a trailing slash can be simplified and made more robust by using StringUtils.removeEnd(). This avoids manual substring manipulation and clearly expresses the intent.

    jmxEndpoint = org.apache.commons.lang3.StringUtils.removeEnd(jmxEndpoint, "/");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant