POC/Discuss: Building integrations package by ruflin · Pull Request #3 · elastic/integrations

ruflin · 2019-08-28T10:57:39Z

This is a proposal on how to build integrations packages for the integration registry (https://github.com/elastic/integrations-registry). The details of this proposal can be found in the integration/README.md file. The ideas is to comment on the proposal directly there, so changes can be discussed per topic and changes are directly visible.

I expect this to be a long running and a PR with lots of discussion. The current content is a first stab at the proposal and we should keep iterating on it. The main goal of this is to make development of integrations as easy as possible and at the same time allow all the automation and verifications we have around integrations in place. Using the original package format and adding meta information would not allow to reuse some content (ECS fields as an example) and would make testing of each dataset separately harder. Each dataset can require a different test setup.

The apache and coredns integrations were taken as an example to have at least to example on how things could work. I added these as I guess discussing directly on the code and the generated content makes discussion easier.

The current code is not final, it has many TODO and duplicated code inside. So please do not review the code yet, but the concept.

The initial discussion around this started in this google doc but all discussions from there an information should make into here to have it in public.

To build the existing packages, run mage build.

This is a proposal on how to build integrations packages for the integration registry (https://github.com/elastic/integrations-registry). The details of this proposal can be found in the `integration/README.md` file. The ideas is to comment on the proposal directly there, so changes can be discussed per topic and changes are directly visible. I expect this to be a long running and a PR with lots of discussion. The current content is a first stab at the proposal and we should keep iterating on it. The main goal of this is to make development of integrations as easy as possible and at the same time allow all the automation and verifications we have around integrations in place. Using the original package format and adding meta information would not allow to reuse some content (ECS fields as an example) and would make testing of each dataset separately harder. Each dataset can require a different test setup. The apache and coredns integrations were taken as an example to have at least to example on how things could work. I added these as I guess discussing directly on the code and the generated content makes discussion easier. The current code is not final, it has many TODO and duplicated code inside. So please do not review the code yet, but the concept. The initial discussion around this started in this [google doc](https://docs.google.com/document/d/1OwTNEc0DTcSyPktwbi7RQPC7iEOh_pZPRyx500kixrg/edit#) but all discussions from there an information should make into here to have it in public.

jsoriano · 2019-09-03T10:21:44Z

integration/README.md

+This is a list of dataset this integration depends on. As packages today do not allow to depend on other packages, it is important to have a dependency feature during building integration to not have to duplicate all the assets. Some examples here are fields for ECS or fields specific to Filebeat. An example is below:
+
+```
+datasets:


Could we use here a different name that more explictly refers to dependencies as requires, or depends_on? I think that datasets could be confusing as it could also meant the datasets provided by the package.

We may also need to differentiate "runtime" dependencies from build-time dependencies (includes?).

I like includes as in the end it's included in the package.

@jsoriano Changing the code, I just had an idea for an other name: imports ?

SGTM, I would only avoid the generic datasets 🙂

integration/README.md

jsoriano · 2019-09-03T11:19:26Z

integration/README.md

+
+Directory for all the icons, screenshots and potentially videos. 
+
+Question: Should we name this media?


If it is expected to contain also videos then yes please 🙂

long term, yes. Will rename.

jsoriano · 2019-09-03T11:39:09Z

integration/README.md

+
+## Versioning
+
+The version of a package is taken from the manifest file. If the CoreDNS package contains `version: 1.2.3` it will build the package `coredns-1.2.3`. For now, no exact version of a dataset is specified. If a dataset is updated, next time packaging is called for an integration, it will pull in the newest assets. So if there is a breaking change in a dataset, it's up the integration package dev to decide if this is needed. To reduce errors we could introduce exact specification of a dataset version. This would mean in case a dataset version is updated, all datasets which reference it must be updated too. As everything is in one repository, this shouldn't be too much hassle but would make it more explicit.


Even if everything is in the same repository now we have to keep thinking that we should support integrations living out of this repository.

I think the tricky part if something is in an other repository will be the includes. But I'm sure we could figure out some automation around copying these over.

We may end up needing some other repository for build-time dependencies 🙂

Or maybe the external source can be specified in the include itself, e.g. for an integration including ECS:

includes: - name: "ecs:ecs" source: https://github.com/elastic/integrations

To allow something like this in the future without needing to support transitive dependencies, we may add the restriction that datasets with package: false cannot have includes, and only datasets with package: false can be included.

And this makes me wonder if datasets with package: false should just be another thing like "libraries"(?) instead of datasets, as they probably will have more restrictions.

All good questions :-)

I initially had it in its separate library directory. This had two disadvantages:

My collection script became more more complex

It's not possible to have a dependency "directly" on an existing package. Let me give an example: I'm assuming ECS will become it's own package that we should also ship. But almost all integrations will depend on it. We could now move 99% of the package into library or have it as a package which others depend ont. Having it's own package solves also the changelog question for example and I would guess it's easier to understand / find. But technically I guess both options work.

So in summary, I think package: false cannot include others, but I don't think that package: true means it cannot be included. But I see how this could bring us in building include-trees. Perhaps we could go with this limitation to keep things simple to get started.

Thinking about this again: The good news with the syntax we have (independent of the name), we can always extend it with more fields as it's an array. not having a source, just assumes it's local.

I'm assuming ECS will become it's own package that we should also ship. But almost all integrations will depend on it.

But then if it is packaged it won't have package: false, right? Or I am misunderstanding something? 🙂

yes, it will not have package: false but others will still depend on it.

jsoriano · 2019-09-03T11:55:12Z

integration/apache/dataset/apache.access/dataset.yml

+      default: 8000
+      validations:
+        min: 0
+        max: 65568


Interesting, but I think that using the keys as names for the configs can lead to ambiguous situations, for example here transport.type is nested in the first block, and an object in the second block. What about having it as a list with the name of the option as an attribute more?

configuration: include: - package: common/logs.yml:configurations options: - name: transport type: nested default: file validations: enum: ["file", "tcp"] required: true - name: transport.type type: string default: "file" enum: ["file", "tcp"] - name: transport.file.paths type: Array<PATH> default: ["%base_path/access.log"] validations: presence: true - name: transport.tcp.port type: range # We can have Port type which has the default validations. default: 8000 validations: min: 0 max: 65568

We will actually not need this for our first version. This is a copy / paste from an initial proposal from @ph .

This may be interesting for that elastic/beats#13357

integration/apache/dataset/apache.access/kibana/index-pattern/metricbeat.json

integration/apache/dataset/apache.access/testing.yml

jsoriano · 2019-09-03T12:06:50Z

integration/apache/dataset/apache.access/testing.yml

@@ -0,0 +1,9 @@
+# For testing definitions


It'd be good to take this into acocunt elastic/beats#10182

Yes, I think the testing.yml is the one the least fleshed out yet. I hope we can take over large parts from what we have today but also get new goodies like what the testing team is working on.

jsoriano · 2019-09-03T12:16:55Z

magefile.go

+// or more contributor license agreements. Licensed under the Elastic License;
+// you may not use this file except in compliance with the Elastic License.
+
+// +build mage


I wonder if we should have an external tool instead of a magefile.go to ease development of new integrations out of this repository. This external tool could even be a beats subcommand.

Agree, should be an external tool in the end that can also be used in any other repo. I don't think it should become part of the Beat because that is "our way" of developing integrations. If someone develops just 1 integration in a repository without dependencies, testing etc. he could just go with the package structure and version it. Building the package is creating a zip file.

Yeah, agree with rather doing it as an external tool.
With current differences between filesets for filebeat, metricsets for metricbeat and so on it could make sense to have the tooling on each beat, but not with integrations, that won't need to be coupled to any beat.

I suggest we define it as a goal to have this binary separated but first build it as part of this repo.

Yes, agree with having the source in this repo, but I'd still consider doing it as a standalone tool, that can be also used with integrations in other repos without copying magefiles, just running the resulting binary.

Yes, agree. I would just not make the conversion of mage to a binary a "blocker" of this PR.

ruflin

@jsoriano Thank you very much for a first review. I'll wait with updating the PR until a few more feedbacks came in. Perhaps I should discuss also a zoom session to walk the team through and directly ask questions?

sayden · 2019-09-04T10:03:50Z

integration/README.md

+
+**Integration**: Integration definition with manifest and several datasets with the assets for the Elastic Stack.
+
+**Dataset**: Group of assets which are grouped together for testing and development purposes.


If this is a group of assets, why don't we call them "Assets"? In my head, when I read datasets I'm thinking in some sort of structured information about some topic that have been already extracted from "somewhere" and is ready to analyze. The wikipedia definition also states something similar: https://en.wikipedia.org/wiki/Data_set

Just doing some "duck testing" here 😄

The description here is unfortunate. These assets together are needed to create a dataset. But I think you have a point here and it needs either some revising of the description or potential renaming. More "duck testing" needed :-)

I'm tempted to agree here that we should call the directory inside each integration assets instead of dataset. Because in some cases, it is literally only assets and nothing in it generates the dataset. One of the reasons of this grouping is also the "things" we test together. All these assets are in one place. It seems to be straight forward to explain that everything we need for testing together goes into one asset directory with a name. @sayden Would that help?

@jsoriano WDYT?

Trying to also add an an example here:

The coredns integration has 3 asset groups:

coredns (all the generic assets)

log (assets for the log dataset)

stats (assets for the stats dataset)

Yes, on a first read dataset also sounded a bit funny to me for that. +1 to consider another naming like assets.
For the directory we could also think in something like src or source as they contain the "source code" of the package.

src reminds me too much of source code but it's actually mostly "config" files. My favorite at the moment: assets

integration/README.md

…ith examples

ruflin · 2019-09-05T14:50:57Z

integration/README.md

+apache/dataset/access/elasticsearch/ingest-pipeline/default.json
+```
+
+On the top level of each integration, it contains the `manifest.yml`, `LICENSE.txt` file and the `changelog.yml`.


@jsoriano I guess we don't have to add a LICENSE.txt file into each integration. We can add it during packaging time based on the entry in the manifest.yml.

I guess that we'd need a license per repository, and it should be added to the generate packages, yes.

We probably need more then 1 ;-)

integration/README.md

Pipelines are input specific. Currently there is no simple way for the integrations manager to know which pipeline should be reference in an input. This introduces the convention that the pipeline name must be a variable which matches one of the pipeline file names in the package. On creating the pipeline instance, the integrations manager will change and prefix the name. The above convention makes it possible to convert it in both places. As an example: The package has an ingest pipeline `foo.yml` or `foo.json`, in the input it will reference `foo`. Then the integration manager builds the datasource and calls it `bar-foo`. It will the replace the input variable with `bar-foo`. This also has consequences on elastic/integrations#3. It means the developer building integrations must ensure that no two pipelines have the same name in one package. To make sure the name in an input matches a pipeline, later on also a validation step should be added.

ruflin · 2019-09-26T08:09:10Z

I'm currently in the process of extending the package structure with inputs: elastic/package-registry#110 This should simplify some of the discussions we have here and brings up much closer to what we have today with datasets.

ruflin · 2019-09-26T11:44:31Z

I had a conversation with @jsoriano about potential changes in the package structur (elastic/package-registry#110) and how it affects this building of package. Overall it should simplify the building of packages as what is built is already almost identical to the end package format. We basically replace what we have today as dataset by inputs.

The fields which will be used for the package should be identical. Config options which are needed for building the packages will go into build.yml in each integration and tests directory for the testing part. This will make building the package simple as files can be copied and these assets be removed.

One of the open questions is on what happens with the fields from the processors and autodiscovery. It would be nice if these mappings would not have to be copied to each input and be part of the template. Instead they should be loaded globally by the package manager. More thoughts needed on how this will work.

I will update this PR to be adjusted to the new format.

mtojek · 2020-05-05T09:43:44Z

@ruflin Can we close this draft PR?

ruflin · 2020-05-05T10:56:32Z

Yes, so many things have changed since I started this.

# Das ist die erste Commit-Beschreibung: Refactor cisco_ise integration - No kv parsing to root level anymore (user log_details_object flattened field instead - More sophisticated parsing of av-pairs - Enhanced error handling - Bug-fixes # Das ist Commit-Beschreibung elastic#2: pipeline_posture_and_client_provisioning_audit.yml # Das ist Commit-Beschreibung elastic#3: pipeline_radius_accounting.yml

* [oracle]: migration with yq * [oracle_weblogic]: migration with yq * [postgresql]: migration with yq * [rabbitmq]: migration with yq * [redisenterprise]: migration with yq * [salesforce]: migration with yq * [spring_boot]: migration with yq * [redis]: migration with yq * [system]: migration with yq * [tomcat]: migration with yq * [traefik]: migration with yq * [vsphere]: migration with yq * [websphere_application_server]: migration with yq * [zookeeper]: migration with yq * [oracle] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -skip-format -fix-dotted-yaml-keys -add-owner-type packages/oracle * [oracle_weblogic] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/oracle_weblogic * [php_fpm] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/php_fpm * [postgresql] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/postgresql * [prometheus_input] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/prometheus_input * [prometheus_input]: fix test-default-config.yml * [rabbitmq] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/rabbitmq * [redisenterprise] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/redisenterprise * [spring_boot] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/spring_boot * [sql] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/sql_input * [statsd_input] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/statsd_input * [system] - change to format_version 3.0.0 The format_version in the package manifest changed from 1.0.0 to 3.0.0. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/system * [traefik] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/traefik * [vsphere] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/vsphere * [websphere_application_server] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/websphere_application_server * [zookeeper] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/zookeeper * Add validation.yml * Fix PR number in changelog * revert changes for packages for which v3 migration failed * Fix PR number in changelog * [oracle]: fix tests * [oracle]: add validation * [oracle_weblogic]: fix tests * [oracle_weblogic]: fix readme * [spring_boot]: fix tests * remove release: beta from oracle and redisenterprise * Remove system package changes * Update packages/oracle/data_stream/database_audit/elasticsearch/ingest_pipeline/default.yml * Revert "remove release: beta from oracle and redisenterprise" This reverts commit 3e1b8aa. * improve healthcheck for spring_boot to fix ci * test commit * Revert "test commit" This reverts commit d951b2e. * play with stack version * [statsd_input]: revert package-spec v3 changes * [sql_input]: revert package-spec v3 changes

jsoriano reviewed Sep 3, 2019

View reviewed changes

ruflin commented Sep 4, 2019

View reviewed changes

sayden reviewed Sep 4, 2019

View reviewed changes

integration/README.md Show resolved Hide resolved

ruflin added 7 commits September 5, 2019 15:42

update structure to changelog

9c1a2b7

remove ingest_pipeline line as not needed

946d082

rename dataset.yml file and update file

29c6ad1

move metricbeat index pattern to the right place

de4749e

update readme with example

10cca28

cleanup readme further

b6f6139

Remove obsolete changelog.md files and complete changelog.yml files w…

582b402

…ith examples

ruflin commented Sep 5, 2019

View reviewed changes

ruflin mentioned this pull request Sep 10, 2019

Add Validation of package elastic/package-registry#97

Merged

cachedout reviewed Sep 11, 2019

View reviewed changes

integration/README.md Outdated Show resolved Hide resolved

integration/README.md Show resolved Hide resolved

fix package link and add note about semver

3c0343a

ruflin mentioned this pull request Sep 19, 2019

Add pipeline reference example to coredns elastic/package-registry#108

Merged

rename dataset to asset

dd506a7

ruflin closed this May 5, 2020

colin-stubbs mentioned this pull request May 9, 2022

Fix for #3295 to ensure custom httpjson input with oauth2 auth and explicit scopes definition works #3297

Merged

colin-stubbs mentioned this pull request May 26, 2022

Multiple issues - Fleet integrations across multiple spaces confuses Kibana/Fleet #3434

Closed

lalit-satapathy mentioned this pull request Oct 6, 2022

SQL Server Integration Update needed (1.4.0) #4244

Closed

joeperuzzi mentioned this pull request Oct 17, 2023

tychon arp demonstration #8222

Closed

daniela-elastic mentioned this pull request Jan 20, 2025

Amazon Bedrock - Add Guardrails dashboard #12332

Merged

9 tasks

ilyannn mentioned this pull request Feb 25, 2026

[syslog_router] Update documentation #17506

Draft

7 tasks


		Directory for all the icons, screenshots and potentially videos.

		Question: Should we name this media?


		## Versioning

		The version of a package is taken from the manifest file. If the CoreDNS package contains `version: 1.2.3` it will build the package `coredns-1.2.3`. For now, no exact version of a dataset is specified. If a dataset is updated, next time packaging is called for an integration, it will pull in the newest assets. So if there is a breaking change in a dataset, it's up the integration package dev to decide if this is needed. To reduce errors we could introduce exact specification of a dataset version. This would mean in case a dataset version is updated, all datasets which reference it must be updated too. As everything is in one repository, this shouldn't be too much hassle but would make it more explicit.


		Integration: Integration definition with manifest and several datasets with the assets for the Elastic Stack.

		Dataset: Group of assets which are grouped together for testing and development purposes.

Conversation

ruflin commented Aug 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruflin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruflin commented Aug 28, 2019 •

edited

Loading