Skip to content

Conversation

@yug-rajani
Copy link
Contributor

What does this PR do?

  • Generated the skeleton of Hadoop integration package.
  • Added 1 data stream ( Application Metrics )
  • Added data collection logic.
  • Added the ingest pipelines.
  • Mapped fields according to the ECS schema and added Fields metadata in the appropriate yml files.
  • Added system test cases.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • If I'm introducing a new feature, I have modified the Kibana version constraint in my package's manifest.yml file to point to the latest Elastic stack release (e.g. ^8.0.0).

How to test this PR locally

  • Clone integrations repo.
  • Install elastic-package locally.
  • Start elastic stack using elastic-package.
  • Move to integrations/packages/hadoop directory.
  • Run the following command to run tests.

elastic-package test

@yug-rajani yug-rajani requested a review from a team as a code owner March 31, 2022 05:37
@yug-rajani yug-rajani self-assigned this Mar 31, 2022
@yug-rajani yug-rajani added enhancement New feature or request Team:Integrations Label for the Integrations team New Integration Issue or pull request for creating a new integration package. labels Mar 31, 2022
@elasticmachine
Copy link

Pinging @elastic/integrations (Team:Integrations)

@yug-rajani yug-rajani requested a review from mtojek March 31, 2022 05:40
@yug-rajani
Copy link
Contributor Author

This PR is a split of #2614 as discussed over the comment #2614 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be updated with 8.2.0 after testing this integration on 8.2.0.

@elasticmachine
Copy link

elasticmachine commented Mar 31, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-04-05T06:46:10.376+0000

  • Duration: 14 min 25 sec

Test stats 🧪

Test Results
Failed 0
Passed 5
Skipped 0
Total 5

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@yug-rajani yug-rajani force-pushed the package_hadoop_application_metrics branch from ed9d09c to 6e75ce7 Compare March 31, 2022 09:01
@yug-rajani yug-rajani mentioned this pull request Mar 31, 2022
4 tasks
services:
hadoop:
build: .
hostname: hadoop_metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that all our talks from #2614 about custom Dockerfile are still valid here, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to our conversation over #2614, we have entirely revamped the system tests based on the JMX-based implementation. Let us know what you think of the new system tests implementation.

We have used the custom Dockerfile here (which uses apache/hadoop:3 as the base image).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see, but it isn't the form I meant :)

I suggested following a similar pattern as here and here. As you can see, namenode, datanode, managers are defined as different services in the same network instead of putting everything on the same node. Maybe we don't need to customize it at all.

I'm also fine with doing it as a follow-up as it isn't critical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, we will change the system tests accordingly in a follow-up PR and get this merged since it is not a blocker


This integration uses Resource Manager API and JMX API to collect above metrics.

## application_metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you can rename it to "application".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
| hadoop.application_metrics.allocated.mb | Total memory allocated to the application's running containers (Mb) | long |
Copy link
Contributor

@mtojek mtojek Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:+1 for filling in descriptions

title: "Hadoop"
version: 0.1.0
license: basic
description: "This Elastic integration collects metrics from hadoop."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Hadoop

@yug-rajani yug-rajani marked this pull request as draft April 1, 2022 06:05
@yug-rajani yug-rajani force-pushed the package_hadoop_application_metrics branch from 0811de5 to 00fe33c Compare April 4, 2022 16:51
@yug-rajani yug-rajani marked this pull request as ready for review April 5, 2022 06:29
@yug-rajani yug-rajani requested a review from mtojek April 5, 2022 06:29
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yug-rajani yug-rajani changed the title [hadoop][application_metrics] Add application_metrics data stream for hadoop [hadoop][application_metrics] Add application data stream for hadoop Apr 5, 2022
@yug-rajani yug-rajani changed the title [hadoop][application_metrics] Add application data stream for hadoop [hadoop][application] Add application data stream for hadoop Apr 5, 2022
@yug-rajani yug-rajani merged commit 97dae0f into elastic:main Apr 5, 2022
@yug-rajani yug-rajani linked an issue May 10, 2022 that may be closed by this pull request
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Integration:hadoop Hadoop New Integration Issue or pull request for creating a new integration package. Team:Integrations Label for the Integrations team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Hadoop integration

3 participants