[hadoop][application] Add application data stream for hadoop #2952

yug-rajani · 2022-03-31T05:37:55Z

What does this PR do?

Generated the skeleton of Hadoop integration package.
Added 1 data stream ( Application Metrics )
Added data collection logic.
Added the ingest pipelines.
Mapped fields according to the ECS schema and added Fields metadata in the appropriate yml files.
Added system test cases.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
If I'm introducing a new feature, I have modified the Kibana version constraint in my package's manifest.yml file to point to the latest Elastic stack release (e.g. ^8.0.0).

How to test this PR locally

Clone integrations repo.
Install elastic-package locally.
Start elastic stack using elastic-package.
Move to integrations/packages/hadoop directory.
Run the following command to run tests.

elastic-package test

elasticmachine · 2022-03-31T05:39:05Z

Pinging @elastic/integrations (Team:Integrations)

yug-rajani · 2022-03-31T05:45:47Z

This PR is a split of #2614 as discussed over the comment #2614 (comment)

yug-rajani · 2022-03-31T05:47:28Z

packages/hadoop/manifest.yml

This will be updated with 8.2.0 after testing this integration on 8.2.0.

elasticmachine · 2022-03-31T05:52:17Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2022-04-05T06:46:10.376+0000
Duration: 14 min 25 sec

Test stats 🧪

Test	Results
Failed	0
Passed	5
Skipped	0
Total	5

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.

mtojek · 2022-03-31T10:46:13Z

packages/hadoop/_dev/deploy/docker/docker-compose.yml

+services:
+  hadoop:
+    build: .
+    hostname: hadoop_metrics


I guess that all our talks from #2614 about custom Dockerfile are still valid here, right?

With respect to our conversation over #2614, we have entirely revamped the system tests based on the JMX-based implementation. Let us know what you think of the new system tests implementation.

We have used the custom Dockerfile here (which uses apache/hadoop:3 as the base image).

Yeah, I see, but it isn't the form I meant :)

I suggested following a similar pattern as here and here. As you can see, namenode, datanode, managers are defined as different services in the same network instead of putting everything on the same node. Maybe we don't need to customize it at all.

I'm also fine with doing it as a follow-up as it isn't critical.

Okay, we will change the system tests accordingly in a follow-up PR and get this merged since it is not a blocker

mtojek · 2022-03-31T11:21:54Z

packages/hadoop/_dev/build/docs/README.md

+
+This integration uses Resource Manager API and JMX API to collect above metrics.
+
+## application_metrics


I guess you can rename it to "application".

Makes sense.

packages/hadoop/data_stream/application_metrics/manifest.yml

mtojek · 2022-03-31T11:28:27Z

packages/hadoop/docs/README.md

+| event.kind | This is one of four ECS Categorization Fields, and indicates the highest level in the ECS category hierarchy. `event.kind` gives high-level information about what type of information the event contains, without being specific to the contents of the event. For example, values of this field distinguish alert events from metric events. The value of this field can be used to inform how these kinds of events should be handled. They may warrant different retention, different access control, it may also help understand whether the data coming in at a regular interval or not. | keyword |
+| event.module | Name of the module this data is coming from. If your monitoring agent supports the concept of modules or plugins to process events of a given source (e.g. Apache logs), `event.module` should contain the name of this module. | keyword |
+| event.type | This is one of four ECS Categorization Fields, and indicates the third level in the ECS category hierarchy. `event.type` represents a categorization "sub-bucket" that, when used along with the `event.category` field values, enables filtering events down to a level appropriate for single visualization. This field is an array. This will allow proper categorization of some events that fall in multiple event types. | keyword |
+| hadoop.application_metrics.allocated.mb | Total memory allocated to the application's running containers (Mb) | long |


:+1 for filling in descriptions

mtojek · 2022-03-31T11:28:39Z

packages/hadoop/manifest.yml

+title: "Hadoop"
+version: 0.1.0
+license: basic
+description: "This Elastic integration collects metrics from hadoop."


nit: Hadoop

packages/hadoop/manifest.yml

mtojek

LGTM!

yug-rajani requested a review from a team as a code owner March 31, 2022 05:37

yug-rajani self-assigned this Mar 31, 2022

yug-rajani added enhancement New feature or request Team:Integrations Label for the Integrations team New Integration Issue or pull request for creating a new integration package. labels Mar 31, 2022

yug-rajani requested a review from mtojek March 31, 2022 05:40

yug-rajani added the Integration:hadoop Hadoop label Mar 31, 2022

yug-rajani commented Mar 31, 2022

View reviewed changes

packages/hadoop/manifest.yml Outdated

Copy link

Contributor Author

yug-rajani Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be updated with 8.2.0 after testing this integration on 8.2.0.

Add application_metrics data stream for hadoop

6e75ce7

yug-rajani force-pushed the package_hadoop_application_metrics branch from ed9d09c to 6e75ce7 Compare March 31, 2022 09:01

yug-rajani mentioned this pull request Mar 31, 2022

[hadoop] Add Hadoop package #2614

Closed

4 tasks

mtojek reviewed Mar 31, 2022

View reviewed changes

yug-rajani marked this pull request as draft April 1, 2022 06:05

Make changes as per review comments

00fe33c

yug-rajani force-pushed the package_hadoop_application_metrics branch from 0811de5 to 00fe33c Compare April 4, 2022 16:51

yug-rajani marked this pull request as ready for review April 5, 2022 06:29

yug-rajani requested a review from mtojek April 5, 2022 06:29

Update CODEOWNERS entry and manifest.yml

d6a7054

mtojek approved these changes Apr 5, 2022

View reviewed changes

yug-rajani changed the title ~~[hadoop][application_metrics] Add application_metrics data stream for hadoop~~ [hadoop][application_metrics] Add application data stream for hadoop Apr 5, 2022

yug-rajani changed the title ~~[hadoop][application_metrics] Add application data stream for hadoop~~ [hadoop][application] Add application data stream for hadoop Apr 5, 2022

yug-rajani merged commit 97dae0f into elastic:main Apr 5, 2022

yug-rajani linked an issue May 10, 2022 that may be closed by this pull request

Create Hadoop integration #1543

Closed

16 tasks


		This integration uses Resource Manager API and JMX API to collect above metrics.

		## application_metrics

[hadoop][application] Add application data stream for hadoop #2952

[hadoop][application] Add application data stream for hadoop #2952

Uh oh!

Conversation

yug-rajani commented Mar 31, 2022

What does this PR do?

Checklist

How to test this PR locally

Uh oh!

elasticmachine commented Mar 31, 2022

Uh oh!

yug-rajani commented Mar 31, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💚 Build Succeeded

Build stats

Test stats 🧪

🤖 GitHub comments

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtojek Mar 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elasticmachine commented Mar 31, 2022 •

edited

Loading

mtojek Mar 31, 2022 •

edited

Loading