You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _data-prepper/pipelines/configuration/sources/s3.md
+34-26Lines changed: 34 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,9 +83,9 @@ You can use both `bucket_owners` and `default_bucket_owner` together.
83
83
84
84
## Configuration
85
85
86
-
You can use the following options to configure the `s3` source.
86
+
You can use the following parameters to configure the `s3` source.
87
87
88
-
Option | Required | Type | Description
88
+
Parameter | Required | Type | Description
89
89
:--- | :--- | :--- | :---
90
90
`notification_type` | Yes | String | Must be `sqs`.
91
91
`notification_source` | No | String | Determines how notifications are received by SQS. Must be `s3` or `eventbridge`. `s3` represents notifications that are directly sent from Amazon S3 to Amazon SQS or fanout notifications from Amazon S3 to Amazon Simple Notification Service (Amazon SNS) to Amazon SQS. `eventbridge` represents notifications from [Amazon EventBridge](https://aws.amazon.com/eventbridge/) and [Amazon Security Lake](https://aws.amazon.com/security-lake/). Default is `s3`.
`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
131
131
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
@@ -140,22 +140,30 @@ The `codec` determines how the `s3` source parses each Amazon S3 object. For inc
140
140
141
141
The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) processor to parse each line.
142
142
143
-
Use the following options to configure the `newline` codec.
143
+
Use the following parameters to configure the `newline` codec.
144
144
145
-
Option | Required | Type | Description
145
+
Parameter | Required | Type | Description
146
146
:--- | :--- |:--------| :---
147
147
`skip_lines` | No | Integer | The number of lines to skip before creating events. You can use this configuration to skip common header rows. Default is `0`.
148
-
`header_destination` | No | String | A key value to assign to the header line of the S3 object. If this option is specified, then each event will contain a `header_destination` field.
148
+
`header_destination` | No | String | A key value to assign to the header line of the S3 object. If this parameter is specified, then each event will contain a `header_destination` field.
149
149
150
150
### json codec
151
151
152
-
The `json` codec parses each S3 object as a single JSON object from a JSON array and then creates a Data Prepper log event for each object in the array.
152
+
The `json` codec parses each S3 object as a single JSON object from a JSON array and then creates a Data Prepper log event for each object in the array.
153
+
154
+
Use the following parameters to configure the `json` codec.
155
+
156
+
Parameter | Required | Type | Description
157
+
:--- | :--- |:--------| :---
158
+
`key_name` | No | String | The name of the input field from which to extract the JSON array and create events.
153
159
154
160
### csv codec
155
161
156
-
The `csv` codec parses objects in comma-separated value (CSV) format, with each row producing a Data Prepper log event. Use the following options to configure the `csv` codec.
162
+
The `csv` codec parses objects in comma-separated value (CSV) format, with each row producing a Data Prepper log event.
163
+
164
+
Use the following parameters to configure the `csv` codec.
157
165
158
-
Option | Required | Type | Description
166
+
Parameters | Required | Type | Description
159
167
:--- |:---------|:------------| :---
160
168
`delimiter` | Yes | Integer | The delimiter separating columns. Default is `,`.
161
169
`quote_character` | Yes | String | The character used as a text qualifier for CSV data. Default is `"`.
`expression` | Yes, when using `s3_select` | String | The expression used to query the object. Maps directly to the [expression](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-Expression) property.
172
180
`expression_type` | No | String | The type of the provided expression. Default value is `SQL`. Maps directly to the [ExpressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ExpressionType).
Use the following options in conjunction with the `csv` configuration for `s3_select` to determine how your parsed CSV file should be formatted.
188
+
Use the following parameters in conjunction with the `csv` configuration for `s3_select` to determine how your parsed CSV file should be formatted.
181
189
182
-
These options map directly to options available in the S3 Select [CSVInput](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html) data type.
190
+
These parameters map directly to inputs available in the S3 Select [CSVInput](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html) data type.
183
191
184
-
Option | Required | Type | Description
192
+
Parameter | Required | Type | Description
185
193
:--- |:---------|:------------| :---
186
194
`file_header_info` | No | String | Describes the first line of input. Maps directly to the [FileHeaderInfo](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-FileHeaderInfo) property.
187
195
`quote_escape` | No | String | A single character used for escaping the quotation mark character inside an already escaped value. Maps directly to the [QuoteEscapeCharacter](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-QuoteEscapeCharacter) property.
188
196
`comments` | No | String | A single character used to indicate that a row should be ignored when the character is present at the start of that row. Maps directly to the [Comments](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-Comments) property.
189
197
190
198
#### json<aname="s3_select_json"></a>
191
199
192
-
Use the following option in conjunction with `json` for `s3_select` to determine how S3 Select processes the JSON file.
200
+
Use the following parameters in conjunction with `json` for `s3_select` to determine how S3 Select processes the JSON file.
193
201
194
-
Option | Required | Type | Description
202
+
Parameter | Required | Type | Description
195
203
:--- | :--- | :--- | :---
196
204
`type` | No | String | The type of JSON array. May be either `DOCUMENT` or `LINES`. Maps directly to the [Type](https://docs.aws.amazon.com/AmazonS3/latest/API/API_JSONInput.html#AmazonS3-Type-JSONInput-Type) property.
197
205
198
206
## Using `scan` with the `s3` source<aname="scan"></a>
199
-
The following parameters allow you to scan S3 objects. All options can be configured at the bucket level.
207
+
The following parameters allow you to scan S3 objects. All parameters can be configured at the bucket level.
200
208
201
-
Option | Required | Type | Description
209
+
Parameter | Required | Type | Description
202
210
:--- | :--- | :--- | :---
203
211
`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together.
204
212
`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. `end_time` and `range` cannot be used together.
Use the following options inside the `filter` configuration.
237
+
Use the following parameters in the `filter` configuration.
230
238
231
-
Option | Required | Type | Description
239
+
Parameter | Required | Type | Description
232
240
:--- | :--- | :--- | :---
233
241
`include_prefix` | No | List | A list of S3 key prefix strings included in the scan. By default, all the objects in a bucket are included.
234
242
`exclude_suffix` | No | List | A list of S3 key suffix strings excluded from the scan. By default, no objects in a bucket are excluded.
235
243
236
244
### scheduling
237
245
238
-
Option | Required | Type | Description
246
+
Parameter | Required | Type | Description
239
247
:--- | :--- | :--- | :---
240
248
`interval` | Yes | String | Indicates the minimum interval between each scan. The next scan in the interval will start after the interval duration from the last scan ends and when all the objects from the previous scan are processed. Supports ISO 8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`).
241
249
`count` | No | Integer | Specifies how many times a bucket will be scanned. Defaults to `Integer.MAX_VALUE`.
0 commit comments