POC/Discuss: Building integrations package#3
Conversation
This is a proposal on how to build integrations packages for the integration registry (https://github.com/elastic/integrations-registry). The details of this proposal can be found in the `integration/README.md` file. The ideas is to comment on the proposal directly there, so changes can be discussed per topic and changes are directly visible. I expect this to be a long running and a PR with lots of discussion. The current content is a first stab at the proposal and we should keep iterating on it. The main goal of this is to make development of integrations as easy as possible and at the same time allow all the automation and verifications we have around integrations in place. Using the original package format and adding meta information would not allow to reuse some content (ECS fields as an example) and would make testing of each dataset separately harder. Each dataset can require a different test setup. The apache and coredns integrations were taken as an example to have at least to example on how things could work. I added these as I guess discussing directly on the code and the generated content makes discussion easier. The current code is not final, it has many TODO and duplicated code inside. So please do not review the code yet, but the concept. The initial discussion around this started in this [google doc](https://docs.google.com/document/d/1OwTNEc0DTcSyPktwbi7RQPC7iEOh_pZPRyx500kixrg/edit#) but all discussions from there an information should make into here to have it in public.
| This is a list of dataset this integration depends on. As packages today do not allow to depend on other packages, it is important to have a dependency feature during building integration to not have to duplicate all the assets. Some examples here are fields for ECS or fields specific to Filebeat. An example is below: | ||
|
|
||
| ``` | ||
| datasets: |
There was a problem hiding this comment.
Could we use here a different name that more explictly refers to dependencies as requires, or depends_on? I think that datasets could be confusing as it could also meant the datasets provided by the package.
We may also need to differentiate "runtime" dependencies from build-time dependencies (includes?).
There was a problem hiding this comment.
I like includes as in the end it's included in the package.
There was a problem hiding this comment.
@jsoriano Changing the code, I just had an idea for an other name: imports ?
There was a problem hiding this comment.
SGTM, I would only avoid the generic datasets 🙂
|
|
||
| Directory for all the icons, screenshots and potentially videos. | ||
|
|
||
| Question: Should we name this media? |
There was a problem hiding this comment.
If it is expected to contain also videos then yes please 🙂
There was a problem hiding this comment.
long term, yes. Will rename.
|
|
||
| ## Versioning | ||
|
|
||
| The version of a package is taken from the manifest file. If the CoreDNS package contains `version: 1.2.3` it will build the package `coredns-1.2.3`. For now, no exact version of a dataset is specified. If a dataset is updated, next time packaging is called for an integration, it will pull in the newest assets. So if there is a breaking change in a dataset, it's up the integration package dev to decide if this is needed. To reduce errors we could introduce exact specification of a dataset version. This would mean in case a dataset version is updated, all datasets which reference it must be updated too. As everything is in one repository, this shouldn't be too much hassle but would make it more explicit. |
There was a problem hiding this comment.
Even if everything is in the same repository now we have to keep thinking that we should support integrations living out of this repository.
There was a problem hiding this comment.
I think the tricky part if something is in an other repository will be the includes. But I'm sure we could figure out some automation around copying these over.
There was a problem hiding this comment.
We may end up needing some other repository for build-time dependencies 🙂
There was a problem hiding this comment.
Or maybe the external source can be specified in the include itself, e.g. for an integration including ECS:
includes:
- name: "ecs:ecs"
source: https://github.com/elastic/integrations
To allow something like this in the future without needing to support transitive dependencies, we may add the restriction that datasets with package: false cannot have includes, and only datasets with package: false can be included.
And this makes me wonder if datasets with package: false should just be another thing like "libraries"(?) instead of datasets, as they probably will have more restrictions.
There was a problem hiding this comment.
All good questions :-)
I initially had it in its separate library directory. This had two disadvantages:
- My collection script became more more complex
- It's not possible to have a dependency "directly" on an existing package. Let me give an example: I'm assuming ECS will become it's own package that we should also ship. But almost all integrations will depend on it. We could now move 99% of the package into library or have it as a package which others depend ont. Having it's own package solves also the changelog question for example and I would guess it's easier to understand / find. But technically I guess both options work.
So in summary, I think package: false cannot include others, but I don't think that package: true means it cannot be included. But I see how this could bring us in building include-trees. Perhaps we could go with this limitation to keep things simple to get started.
There was a problem hiding this comment.
Thinking about this again: The good news with the syntax we have (independent of the name), we can always extend it with more fields as it's an array. not having a source, just assumes it's local.
There was a problem hiding this comment.
I'm assuming ECS will become it's own package that we should also ship. But almost all integrations will depend on it.
But then if it is packaged it won't have package: false, right? Or I am misunderstanding something? 🙂
There was a problem hiding this comment.
yes, it will not have package: false but others will still depend on it.
| default: 8000 | ||
| validations: | ||
| min: 0 | ||
| max: 65568 |
There was a problem hiding this comment.
Interesting, but I think that using the keys as names for the configs can lead to ambiguous situations, for example here transport.type is nested in the first block, and an object in the second block. What about having it as a list with the name of the option as an attribute more?
configuration:
include:
- package: common/logs.yml:configurations
options:
- name: transport
type: nested
default: file
validations:
enum: ["file", "tcp"]
required: true
- name: transport.type
type: string
default: "file"
enum: ["file", "tcp"]
- name: transport.file.paths
type: Array<PATH>
default: ["%base_path/access.log"]
validations:
presence: true
- name: transport.tcp.port
type: range # We can have Port type which has the default validations.
default: 8000
validations:
min: 0
max: 65568
There was a problem hiding this comment.
We will actually not need this for our first version. This is a copy / paste from an initial proposal from @ph .
There was a problem hiding this comment.
This may be interesting for that elastic/beats#13357
integration/apache/dataset/apache.access/kibana/index-pattern/metricbeat.json
Show resolved
Hide resolved
| @@ -0,0 +1,9 @@ | |||
| # For testing definitions | |||
There was a problem hiding this comment.
It'd be good to take this into acocunt elastic/beats#10182
There was a problem hiding this comment.
Yes, I think the testing.yml is the one the least fleshed out yet. I hope we can take over large parts from what we have today but also get new goodies like what the testing team is working on.
| // or more contributor license agreements. Licensed under the Elastic License; | ||
| // you may not use this file except in compliance with the Elastic License. | ||
|
|
||
| // +build mage |
There was a problem hiding this comment.
I wonder if we should have an external tool instead of a magefile.go to ease development of new integrations out of this repository. This external tool could even be a beats subcommand.
There was a problem hiding this comment.
Agree, should be an external tool in the end that can also be used in any other repo. I don't think it should become part of the Beat because that is "our way" of developing integrations. If someone develops just 1 integration in a repository without dependencies, testing etc. he could just go with the package structure and version it. Building the package is creating a zip file.
There was a problem hiding this comment.
Yeah, agree with rather doing it as an external tool.
With current differences between filesets for filebeat, metricsets for metricbeat and so on it could make sense to have the tooling on each beat, but not with integrations, that won't need to be coupled to any beat.
There was a problem hiding this comment.
I suggest we define it as a goal to have this binary separated but first build it as part of this repo.
There was a problem hiding this comment.
Yes, agree with having the source in this repo, but I'd still consider doing it as a standalone tool, that can be also used with integrations in other repos without copying magefiles, just running the resulting binary.
There was a problem hiding this comment.
Yes, agree. I would just not make the conversion of mage to a binary a "blocker" of this PR.
|
|
||
| **Integration**: Integration definition with manifest and several datasets with the assets for the Elastic Stack. | ||
|
|
||
| **Dataset**: Group of assets which are grouped together for testing and development purposes. |
There was a problem hiding this comment.
If this is a group of assets, why don't we call them "Assets"? In my head, when I read datasets I'm thinking in some sort of structured information about some topic that have been already extracted from "somewhere" and is ready to analyze. The wikipedia definition also states something similar: https://en.wikipedia.org/wiki/Data_set
Just doing some "duck testing" here 😄
There was a problem hiding this comment.
The description here is unfortunate. These assets together are needed to create a dataset. But I think you have a point here and it needs either some revising of the description or potential renaming. More "duck testing" needed :-)
There was a problem hiding this comment.
I'm tempted to agree here that we should call the directory inside each integration assets instead of dataset. Because in some cases, it is literally only assets and nothing in it generates the dataset. One of the reasons of this grouping is also the "things" we test together. All these assets are in one place. It seems to be straight forward to explain that everything we need for testing together goes into one asset directory with a name. @sayden Would that help?
@jsoriano WDYT?
There was a problem hiding this comment.
Trying to also add an an example here:
The coredns integration has 3 asset groups:
- coredns (all the generic assets)
- log (assets for the log dataset)
- stats (assets for the stats dataset)
There was a problem hiding this comment.
Yes, on a first read dataset also sounded a bit funny to me for that. +1 to consider another naming like assets.
For the directory we could also think in something like src or source as they contain the "source code" of the package.
There was a problem hiding this comment.
src reminds me too much of source code but it's actually mostly "config" files. My favorite at the moment: assets
| apache/dataset/access/elasticsearch/ingest-pipeline/default.json | ||
| ``` | ||
|
|
||
| On the top level of each integration, it contains the `manifest.yml`, `LICENSE.txt` file and the `changelog.yml`. |
There was a problem hiding this comment.
@jsoriano I guess we don't have to add a LICENSE.txt file into each integration. We can add it during packaging time based on the entry in the manifest.yml.
There was a problem hiding this comment.
I guess that we'd need a license per repository, and it should be added to the generate packages, yes.
There was a problem hiding this comment.
We probably need more then 1 ;-)
Pipelines are input specific. Currently there is no simple way for the integrations manager to know which pipeline should be reference in an input. This introduces the convention that the pipeline name must be a variable which matches one of the pipeline file names in the package. On creating the pipeline instance, the integrations manager will change and prefix the name. The above convention makes it possible to convert it in both places. As an example: The package has an ingest pipeline `foo.yml` or `foo.json`, in the input it will reference `foo`. Then the integration manager builds the datasource and calls it `bar-foo`. It will the replace the input variable with `bar-foo`. This also has consequences on elastic/integrations#3. It means the developer building integrations must ensure that no two pipelines have the same name in one package. To make sure the name in an input matches a pipeline, later on also a validation step should be added.
Pipelines are input specific. Currently there is no simple way for the integrations manager to know which pipeline should be reference in an input. This introduces the convention that the pipeline name must be a variable which matches one of the pipeline file names in the package. On creating the pipeline instance, the integrations manager will change and prefix the name. The above convention makes it possible to convert it in both places. As an example: The package has an ingest pipeline `foo.yml` or `foo.json`, in the input it will reference `foo`. Then the integration manager builds the datasource and calls it `bar-foo`. It will the replace the input variable with `bar-foo`. This also has consequences on elastic/integrations#3. It means the developer building integrations must ensure that no two pipelines have the same name in one package. To make sure the name in an input matches a pipeline, later on also a validation step should be added.
|
I'm currently in the process of extending the package structure with inputs: elastic/package-registry#110 This should simplify some of the discussions we have here and brings up much closer to what we have today with datasets. |
|
I had a conversation with @jsoriano about potential changes in the package structur (elastic/package-registry#110) and how it affects this building of package. Overall it should simplify the building of packages as what is built is already almost identical to the end package format. We basically replace what we have today as The fields which will be used for the package should be identical. Config options which are needed for building the packages will go into One of the open questions is on what happens with the fields from the processors and autodiscovery. It would be nice if these mappings would not have to be copied to each input and be part of the template. Instead they should be loaded globally by the package manager. More thoughts needed on how this will work. I will update this PR to be adjusted to the new format. |
|
@ruflin Can we close this draft PR? |
|
Yes, so many things have changed since I started this. |
# Das ist die erste Commit-Beschreibung: Refactor cisco_ise integration - No kv parsing to root level anymore (user log_details_object flattened field instead - More sophisticated parsing of av-pairs - Enhanced error handling - Bug-fixes # Das ist Commit-Beschreibung elastic#2: pipeline_posture_and_client_provisioning_audit.yml # Das ist Commit-Beschreibung elastic#3: pipeline_radius_accounting.yml
* [oracle]: migration with yq * [oracle_weblogic]: migration with yq * [postgresql]: migration with yq * [rabbitmq]: migration with yq * [redisenterprise]: migration with yq * [salesforce]: migration with yq * [spring_boot]: migration with yq * [redis]: migration with yq * [system]: migration with yq * [tomcat]: migration with yq * [traefik]: migration with yq * [vsphere]: migration with yq * [websphere_application_server]: migration with yq * [zookeeper]: migration with yq * [oracle] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -skip-format -fix-dotted-yaml-keys -add-owner-type packages/oracle * [oracle_weblogic] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/oracle_weblogic * [php_fpm] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/php_fpm * [postgresql] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/postgresql * [prometheus_input] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/prometheus_input * [prometheus_input]: fix test-default-config.yml * [rabbitmq] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/rabbitmq * [redisenterprise] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/redisenterprise * [spring_boot] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/spring_boot * [sql] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/sql_input * [statsd_input] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 2.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/statsd_input * [system] - change to format_version 3.0.0 The format_version in the package manifest changed from 1.0.0 to 3.0.0. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/system * [traefik] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/traefik * [vsphere] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/vsphere * [websphere_application_server] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/websphere_application_server * [zookeeper] - removed dotted YAML keys from manifest The format_version in the package manifest changed from 1.0.0 to 3.0.0. Removed dotted YAML keys from package manifest. Added 'owner.type: elastic' to package manifest. [git-generate] go run github.com/andrewkroh/go-examples/ecs-update@latest -v -format-version=3.0.0 -fix-dotted-yaml-keys -add-owner-type packages/zookeeper * Add validation.yml * Fix PR number in changelog * revert changes for packages for which v3 migration failed * Fix PR number in changelog * [oracle]: fix tests * [oracle]: add validation * [oracle_weblogic]: fix tests * [oracle_weblogic]: fix readme * [spring_boot]: fix tests * remove release: beta from oracle and redisenterprise * Remove system package changes * Update packages/oracle/data_stream/database_audit/elasticsearch/ingest_pipeline/default.yml * Revert "remove release: beta from oracle and redisenterprise" This reverts commit 3e1b8aa. * improve healthcheck for spring_boot to fix ci * test commit * Revert "test commit" This reverts commit d951b2e. * play with stack version * [statsd_input]: revert package-spec v3 changes * [sql_input]: revert package-spec v3 changes
This is a proposal on how to build integrations packages for the integration registry (https://github.com/elastic/integrations-registry). The details of this proposal can be found in the
integration/README.mdfile. The ideas is to comment on the proposal directly there, so changes can be discussed per topic and changes are directly visible.I expect this to be a long running and a PR with lots of discussion. The current content is a first stab at the proposal and we should keep iterating on it. The main goal of this is to make development of integrations as easy as possible and at the same time allow all the automation and verifications we have around integrations in place. Using the original package format and adding meta information would not allow to reuse some content (ECS fields as an example) and would make testing of each dataset separately harder. Each dataset can require a different test setup.
The apache and coredns integrations were taken as an example to have at least to example on how things could work. I added these as I guess discussing directly on the code and the generated content makes discussion easier.
The current code is not final, it has many TODO and duplicated code inside. So please do not review the code yet, but the concept.
The initial discussion around this started in this google doc but all discussions from there an information should make into here to have it in public.
To build the existing packages, run
mage build.