Skip to content

Commit 93a3953

Browse files
authored
Custom ingest pipelines (elastic#2094)
* docs: incredibly _rough_ draft * docs: clean 🧽🧽 * docs: remove notes * docs: titles * docs: fix build error * docs: clarify what has a pipeline
1 parent 95a9c56 commit 93a3953

File tree

1 file changed

+207
-2
lines changed

1 file changed

+207
-2
lines changed

data-streams.asciidoc

Lines changed: 207 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ These templates are loaded when the integration is installed, and are used to co
9696

9797
[discrete]
9898
[[data-streams-ilm]]
99-
== Configure an {ilm} ({ilm-init}) policy
99+
== {ilm} ({ilm-init})
100100

101101
Use the {ref}/index-lifecycle-management.html[index lifecycle
102102
management] ({ilm-init}) feature in {es} to manage your {agent} data stream indices as they age.
@@ -108,9 +108,29 @@ By default, these data streams use an {ilm-init} policy that matches their data
108108
For example, the data stream `metrics-system.logs-*`,
109109
uses the metrics {ilm-init} policy as defined in the `metrics-system.logs` index template.
110110

111+
Want to customize your index lifecycle management? See <<data-streams-ilm-tutorial>>.
112+
111113
[discrete]
114+
[[data-streams-pipelines]]
115+
== Ingest pipelines
116+
117+
{agent} integration data streams ship with a default {ref}/ingest.html[ingest pipeline]
118+
that preprocesses and enriches data before indexing.
119+
The default pipeline should not be directly edited as changes can easily break the functionality of the integration.
120+
121+
Starting in version 8.4, all default ingest pipelines call a non-existent and non-versioned "`@custom`" ingest pipeline.
122+
If left uncreated, this pipeline has no effect on your data. However, if added to a data stream and customized,
123+
this pipeline can be used for custom data processing, adding fields, sanitizing data, and more.
124+
125+
The full name of the `@custom` pipeline follows the following pattern: `<type>-<dataset>@custom`.
126+
The `@custom` pipeline can directly contain processors or you can use the
127+
pipeline processor to call other pipelines that can be shared across multiple data streams or integrations.
128+
The `@custom` pipeline will persist across all version upgrades.
129+
130+
See <<data-streams-pipeline-tutorial>> to get started.
131+
112132
[[data-streams-ilm-tutorial]]
113-
== Tutorial: Customize data retention for integrations
133+
== Tutorial: Customize data retention policies
114134

115135
This tutorial explains how to apply a custom {ilm-init} policy to an integration's data stream.
116136

@@ -240,3 +260,188 @@ or force a rollover using the {ref}/indices-rollover-index.html[{es} rollover AP
240260
----
241261
POST /metrics-system.network-production/_rollover/
242262
----
263+
264+
[[data-streams-pipeline-tutorial]]
265+
== Tutorial: Transform data with custom ingest pipelines
266+
267+
This tutorial explains how to add a custom ingest pipeline to an Elastic Integration.
268+
Custom pipelines can be used to add custom data processing,
269+
like adding fields, obfuscate sensitive information, and more.
270+
271+
**Scenario:** You have {agent}s collecting system metrics with the System integration.
272+
273+
**Goal:** Add a custom ingest pipeline that adds a new field to each {es} document before it is indexed.
274+
275+
[discrete]
276+
[[data-streams-pipeline-one]]
277+
=== Step 1: Create a custom ingest pipeline
278+
279+
Create a custom ingest pipeline that will be called by the default integration pipeline.
280+
In this tutorial, we'll create a pipeline that adds a new field to our documents.
281+
282+
. In {kib}, navigate to **Stack Management** -> **Ingest Pipelines** -> **Create pipeline** -> **New pipeline**.
283+
284+
. Name your pipeline. We'll call this one, `add_field`.
285+
286+
. Select **Add a processor**. Fill out the following information:
287+
+
288+
** Processor: "Set"
289+
** Field: `test`
290+
** Value: `true`
291+
+
292+
The {ref}/set-processor.html[Set processor] sets a document field and associates it with the specified value.
293+
294+
. Click **Add**.
295+
296+
. Click **Create pipeline**.
297+
298+
[discrete]
299+
[[data-streams-pipeline-two]]
300+
=== Step 2: Apply your ingest pipeline
301+
302+
Add a custom pipeline to an integration by calling it from the default ingest pipeline.
303+
The custom pipeline will run after the default pipeline but before the final pipeline.
304+
305+
[discrete]
306+
==== Edit integration
307+
308+
Add a custom pipeline to an integration from the **Edit integration** workflow.
309+
The integration must already be configured and installed before a custom pipeline can be added.
310+
To enter this workflow, do the following:
311+
312+
. Navigate to **{fleet}**
313+
. Select the relevant {agent} policy
314+
. Search for the integration you want to edit
315+
. Select **Actions** -> **Edit integration**
316+
317+
[discrete]
318+
==== Select a data stream
319+
320+
Most integrations write to multiple data streams.
321+
You'll need to add the custom pipeline to each data stream individually.
322+
323+
. Find the first data stream you wish to edit and select **Change defaults**.
324+
For this tutorial, find the data stream configuration titled, **Collect metrics from System instances**.
325+
326+
. Scroll to **System CPU metrics** and under **Advanced options** select **Add custom pipeline**.
327+
+
328+
This will take you to the **Create pipeline** workflow in **Stack management**.
329+
330+
[discrete]
331+
==== Add the pipeline
332+
333+
Add the pipeline you created in step one.
334+
335+
. Select **Add a processor**. Fill out the following information:
336+
+
337+
** Processor: "Pipeline"
338+
** Pipeline name: "add_field"
339+
** Value: `true`
340+
341+
. Click **Create pipeline** to return to the **Edit integration** page.
342+
343+
[discrete]
344+
==== Roll over the data stream (optional)
345+
346+
For pipeline changes to take effect immediately, you must roll over the data stream.
347+
If you do not, the changes will not take effect until the next scheduled roll over.
348+
Select **Apply now and rollover**.
349+
350+
After the data stream rolls over, note the name of the custom ingest pipeline.
351+
In this tutorial, it's `metrics-system.cpu@custom`.
352+
The name follows the pattern `<type>-<dataset>@custom`:
353+
354+
* type: `metrics`
355+
* dataset: `system.cpu`
356+
* Custom ingest pipeline designation: `@custom`
357+
358+
[discrete]
359+
==== Repeat
360+
361+
Add the custom ingest pipeline to any other data streams you wish to update.
362+
363+
[discrete]
364+
[[data-streams-pipeline-three]]
365+
=== Step 3: Test the ingest pipeline (optional)
366+
367+
Allow time for new data to be ingested before testing your pipeline.
368+
In a new window, open {kib} and navigate to **{kib} Dev tools**.
369+
370+
Use an {ref}/query-dsl-exists-query.html[exists query] to ensure that the
371+
new field, "test" is being applied to documents.
372+
373+
[source,console]
374+
----
375+
GET metrics-system.cpu-default/_search <1>
376+
{
377+
"query": {
378+
"exists": {
379+
"field": "test" <2>
380+
}
381+
}
382+
}
383+
----
384+
<1> The data stream to search. In this tutorial, we've edited the `metrics-system.cpu` type and dataset.
385+
`default` is the default namespace.
386+
Combining all three of these gives us a data stream name of `metrics-system.cpu-default`.
387+
<2> The name of the field set in step one.
388+
389+
If your custom pipeline is working correctly, this query will return at least one document.
390+
391+
[discrete]
392+
[[data-streams-pipeline-four]]
393+
=== Step 4: Add custom mappings
394+
395+
Now that a new field is being set in your {es} documents, you'll want to assign a new mapping for that field.
396+
Use the `@custom` component template to apply custom mappings to an integration data stream.
397+
398+
In the **Edit integration** workflow, do the following:
399+
400+
. Under **Advanced options** select the pencil icon to edit the `@custom` component template.
401+
402+
. Define the new field for your indexed documents. Select **Add field** and add the following information:
403+
+
404+
* Field name: `test`
405+
* Field type: `Boolean`
406+
407+
. Click **Add field**.
408+
409+
. Click **Review** to fast-forward to the review step and click **Save component template** to return to the **Edit integration** workflow.
410+
411+
. For changes to take effect immediately, select **Apply now and rollover**.
412+
413+
[discrete]
414+
[[data-streams-pipeline-five]]
415+
=== Step 5: Test the custom mappings (optional)
416+
417+
Allow time for new data to be ingested before testing your mappings.
418+
In a new window, open {kib} and navigate to **{kib} Dev tools**.
419+
420+
Use the {ref}/indices-get-field-mapping.html[Get field mapping API] to ensure that the
421+
custom mapping has been applied.
422+
423+
[source,console]
424+
----
425+
GET metrics-system.cpu-default/_mapping/field/test <1>
426+
----
427+
<1> The data stream to search. In this tutorial, we've edited the `metrics-system.cpu` type and dataset.
428+
`default` is the default namespace.
429+
Combining all three of these gives us a data stream name of `metrics-system.cpu-default`.
430+
431+
The result should include `type: "boolean"` for the specified field.
432+
433+
[source,json]
434+
----
435+
".ds-metrics-system.cpu-default-2022.08.10-000002": {
436+
"mappings": {
437+
"test": {
438+
"full_name": "test",
439+
"mapping": {
440+
"test": {
441+
"type": "boolean"
442+
}
443+
}
444+
}
445+
}
446+
}
447+
----

0 commit comments

Comments
 (0)