Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions _data-prepper/common-use-cases/common-use-cases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: default
title: Common use cases
has_children: true
nav_order: 15
redirect_from:
- /data-prepper/common-use-cases/
---

# Common use cases

You can use Data Prepper for several different purposes, including trace analytics, log analytics, Amazon S3 log analytics, and metrics ingestion.
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
layout: default
title: Log analytics
parent: Common use cases
nav_order: 15
---

Expand Down
6 changes: 2 additions & 4 deletions _data-prepper/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ layout: default
title: Getting started
nav_order: 5
redirect_from:
- /clients/data-prepper/getting-started/
- /data-prepper/get-started/
- /clients/data-prepper/get-started/
---

Expand Down Expand Up @@ -44,7 +42,7 @@ You will configure two files:
Depending on your use case, we have a few different guides to configuring Data Prepper.

* [Trace Analytics](https://github.com/opensearch-project/data-prepper/blob/main/docs/trace_analytics.md)
* [Log Ingestion](https://github.com/opensearch-project/data-prepper/blob/main/docs/log_analytics.md): Learn how to set up Data Prepper for log observability.
* [Log Analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability.
* [Simple Pipeline](https://github.com/opensearch-project/data-prepper/blob/main/docs/simple_pipelines.md): Learn the basics of Data Prepper pipelines with some simple configurations.

## 3. Defining a pipeline
Expand All @@ -71,7 +69,7 @@ docker run --name data-prepper \
opensearchproject/data-prepper:latest
```

This sample pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For more examples and details about more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines).
The preceding example pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For further detailed examples of more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/).

After starting Data Prepper, you should see log output and some UUIDs after a few seconds:

Expand Down
4 changes: 1 addition & 3 deletions _data-prepper/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ title: Data Prepper
nav_order: 1
has_children: false
has_toc: false
redirect_from:
- /clients/tools/data-prepper/
- /clients/data-prepper/
redirect_from:
- /clients/data-prepper/index/
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
---
layout: default
title: Configuring Data Prepper
has_children: true
nav_order: 100
redirect_from:
- /clients/data-prepper/data-prepper-reference/
parent: Managing Data Prepper
nav_order: 10
---

# Configuring Data Prepper
Expand All @@ -31,15 +29,15 @@ peer_forwarder | No | Object | Peer forwarder configurations. See [Peer forwarde

The following section details various configuration options for peer forwarder.

#### General options for peer forwarder
#### General options for peer forwarding

Option | Required | Type | Description
:--- | :--- | :--- | :---
port | No | Integer | The port number peer forwarder server is running on. Valid options are between 0 and 65535. Defaults is 4994.
request_timeout | No | Integer | Request timeout in milliseconds for peer forwarder HTTP server. Default is 10000.
server_thread_count | No | Integer | Number of threads used by peer forwarder server. Default is 200.
client_thread_count | No | Integer | Number of threads used by peer forwarder client. Default is 200.
max_connection_count | No | Integer | Maximum number of open connections for peer forwarder server. Default is 500.
port | No | Integer | The peer forwarding server port. Valid options are between 0 and 65535. Defaults is 4994.
request_timeout | No | Integer | Request timeout for the peer forwarder HTTP server in milliseconds. Default is 10000.
server_thread_count | No | Integer | Number of threads used by the peer forwarder server. Default is 200.
client_thread_count | No | Integer | Number of threads used by the peer forwarder client. Default is 200.
max_connection_count | No | Integer | Maximum number of open connections for the peer forwarder server. Default is 500.
max_pending_requests | No | Integer | Maximum number of allowed tasks in ScheduledThreadPool work queue. Default is 1024.
discovery_mode | No | String | Peer discovery mode to use. Valid options are `local_node`, `static`, `dns`, or `aws_cloud_map`. Defaults to `local_node`, which processes events locally.
static_endpoints | Conditionally | List | A list containing endpoints of all Data Prepper instances. Required if `discovery_mode` is set to static.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
layout: default
title: Configuring Log4j
nav_order: 25
parent: Managing Data Prepper
nav_order: 20
---

# Configuring Log4j
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
layout: default
title: Core APIs
nav_order: 20
parent: Managing Data Prepper
nav_order: 15
---

# Core APIs
Expand Down
10 changes: 10 additions & 0 deletions _data-prepper/managing-data-prepper/managing-data-prepper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
layout: default
title: Managing Data Prepper
has_children: true
nav_order: 20
---

# Managing Data Prepper

You can perform administrator functions for Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation.
59 changes: 59 additions & 0 deletions _data-prepper/managing-data-prepper/monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
layout: default
title: Monitoring
parent: Administrating Data Prepper
nav_order: 25
---

# Monitoring Data Prepper with metrics

You can monitor Data Prepper with metrics using [Micrometer](https://micrometer.io/). There are two types of metrics: JVM/system metrics and plugin metrics. [Prometheus](https://prometheus.io/) is used as the default metrics backend.

## JVM and system metrics

JVM and system metrics are runtime metrics that are used to monitor Data Prepper instances. They include metrics for classloaders, memory, garbage collection, threads, and others. For more information, see [JVM and system metrics](https://micrometer.io/docs/ref/jvm).

### Naming

JVM and system metrics follow predefined names in [Micrometer](https://micrometer.io/docs/concepts#_naming_meters). For example, the Micrometer metrics name for memory usage is `jvm.memory.used`. Micrometer changes the name to match the metrics system. Following the same example, `jvm.memory.used` is reported to Prometheus as `jvm_memory_used`, and is reported to Amazon CloudWatch as `jvm.memory.used.value`.

### Serving

By default, metrics are served from the **/metrics/sys** endpoint on the Data Prepper server in Prometheus scrape format. You can configure Prometheus to scrape from the Data Prepper URL. Prometheus then polls Data Prepper for metrics and stores them in its database. To visualize the data, you can set up any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like Amazon CloudWatch, which does not require or host the endpoint but publishes the metrics directly to CloudWatch.

## Plugin metrics

Plugins report their own metrics. Data Prepper uses a naming convention to help with consistency in the metrics. Plugin metrics do not use dimensions.


1. AbstractBuffer
- Counter
- `recordsWritten`: The number of records written into a buffer
- `recordsRead`: The number of records read from a buffer
- `recordsProcessed`: The number of records read from a buffer and marked as processed
- `writeTimeouts`: The count of write timeouts in a buffer
- Gaugefir
- `recordsInBuffer`: The number of records in a buffer
- `recordsInFlight`: The number of records read from a buffer and being processed by data-prepper downstreams (for example, processor, sink)
- Timer
- `readTimeElapsed`: The time elapsed while reading from a buffer
- `checkpointTimeElapsed`: The time elapsed while checkpointing
2. AbstractProcessor
- Counter
- `recordsIn`: The number of records ingressed into a processor
- `recordsOut`: The number of records egressed from a processor
- Timer
- `timeElapsed`: The time elapsed during initiation of a processor
3. AbstractSink
- Counter
- `recordsIn`: The number of records ingressed into a sink
- Timer
- `timeElapsed`: The time elapsed during execution of a sink

### Naming

Metrics follow a naming convention of **PIPELINE_NAME_PLUGIN_NAME_METRIC_NAME**. For example, a **recordsIn** metric for the **opensearch-sink** plugin in a pipeline named **output-pipeline** has a qualified name of **output-pipeline_opensearch_sink_recordsIn**.

### Serving

By default, metrics are served from the **/metrics/sys** endpoint on the Data Prepper server in a Prometheus scrape format. You can configure Prometheus to scrape from the Data Prepper URL. The Data Prepper server port has a default value of `4900` that you can modify, and this port can be used for any frontend that accepts Prometheus metrics, such as [Grafana](https://prometheus.io/docs/visualization/grafana/). You can update the configuration to serve metrics to other registries like CloudWatch, that does not require or host the endpoint, but publishes the metrics directly to CloudWatch.
2 changes: 1 addition & 1 deletion _data-prepper/migrate-open-distro.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Migrating from Open Distro
nav_order: 35
nav_order: 30
---

# Migrating from Open Distro
Expand Down
8 changes: 4 additions & 4 deletions _data-prepper/migrating-from-logstash-data-prepper.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
---
layout: default
title: Migrating from Logstash
nav_order: 30
redirect_from:
nav_order: 25
redirect_from:
- /data-prepper/configure-logstash-data-prepper/
---

# Migrating from Logstash

You can run Data Prepper with a Logstash configuration.

As mentioned in the [Getting started]({{site.url}}{{site.baseurl}}/data-prepper/get-started/) guide, you'll need to configure Data Prepper with a pipeline using a `pipelines.yaml` file.
As mentioned in [Getting started with Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/), you'll need to configure Data Prepper with a pipeline using a `pipelines.yaml` file.

Alternatively, if you have a Logstash configuration `logstash.conf` to configure Data Prepper instead of `pipelines.yaml`.

Expand All @@ -28,7 +28,7 @@ As of the Data Prepper 1.2 release, the following plugins from the Logstash conf

## Running Data Prepper with a Logstash configuration

1. To install Data Prepper's Docker image, see the Installing Data Prepper in [Get Started]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper).
1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper).

2. Run the Docker image installed in Step 1 by supplying your `logstash.conf` configuration.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Bounded blocking
parent: Buffers
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 50
---

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
layout: default
title: Buffers
parent: Configuring Data Prepper
parent: Pipelines
has_children: true
nav_order: 50
nav_order: 20
---

# Buffers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: add_entries
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: aggregate
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: copy_values
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: csv
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: date
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: delete_entries
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: drop_events
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand All @@ -14,7 +14,7 @@ Drops all the events that are passed into this processor.

Option | Required | Type | Description
:--- | :--- | :--- | :---
drop_when | Yes | String | Accepts a Data Prepper Expression string following the [Data Prepper Expression Syntax](https://github.com/opensearch-project/data-prepper/blob/main/docs/expression_syntax.md). Configuring `drop_events` with `drop_when: true` drops all the events received.
drop_when | Yes | String | Accepts a Data Prepper Expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). Configuring `drop_events` with `drop_when: true` drops all the events received.
handle_failed_events | No | Enum | Specifies how exceptions are handled when an exception occurs while evaluating an event. Default value is `drop`, which drops the event so it doesn't get sent to OpenSearch. Available options are `drop`, `drop_silently`, `skip`, `skip_silently`. For more information, see [handle_failed_events](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/drop-events-processor#handle_failed_events).

<!---## Configuration
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: grok
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: json
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: key_value
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: lowercase_string
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: otel_trace_raw
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 45
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
layout: default
title: Processors
has_children: true
parent: Configuring Data Prepper
nav_order: 100
parent: Pipelines
nav_order: 25
---

# Processors
Expand All @@ -13,10 +13,6 @@ Processors perform some action on your data: filter, transform, enrich, etc.
Prior to Data Prepper 1.3, Processors were named Preppers. Starting in Data Prepper 1.3, the term Prepper is deprecated in favor of Processor. Data Prepper will continue to support the term "Prepper" until 2.0, where it will be removed.
{: .note }





## copy_values

Copy values within an event. `copy_values` is part of [mutate event](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/mutate-event-processors#mutate-event-processors) processors.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: rename_keys
parent: Processors
grand_parent: Configuring Data Prepper
grand_parent: Pipelines
nav_order: 44
---

Expand Down
Loading