Skip to content

Commit 4230a48

Browse files
tkykenmtnatebower
andauthored
add explain pf key_name parameter for json codec of s3 source (#10559)
* add explain pf key_name parameter for json codec of s3 source PR for issue #10558 Signed-off-by: tkykenmt <[email protected]> * add explain pf key_name parameter for json codec of s3 source - fix some terms to reflect review Signed-off-by: tkykenmt <[email protected]> * Apply suggestions from code review Signed-off-by: Nathan Bower <[email protected]> --------- Signed-off-by: tkykenmt <[email protected]> Signed-off-by: Nathan Bower <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
1 parent 0ad53ab commit 4230a48

File tree

1 file changed

+34
-26
lines changed
  • _data-prepper/pipelines/configuration/sources

1 file changed

+34
-26
lines changed

_data-prepper/pipelines/configuration/sources/s3.md

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,9 @@ You can use both `bucket_owners` and `default_bucket_owner` together.
8383

8484
## Configuration
8585

86-
You can use the following options to configure the `s3` source.
86+
You can use the following parameters to configure the `s3` source.
8787

88-
Option | Required | Type | Description
88+
Parameter | Required | Type | Description
8989
:--- | :--- | :--- | :---
9090
`notification_type` | Yes | String | Must be `sqs`.
9191
`notification_source` | No | String | Determines how notifications are received by SQS. Must be `s3` or `eventbridge`. `s3` represents notifications that are directly sent from Amazon S3 to Amazon SQS or fanout notifications from Amazon S3 to Amazon Simple Notification Service (Amazon SNS) to Amazon SQS. `eventbridge` represents notifications from [Amazon EventBridge](https://aws.amazon.com/eventbridge/) and [Amazon Security Lake](https://aws.amazon.com/security-lake/). Default is `s3`.
@@ -112,7 +112,7 @@ Option | Required | Type | Description
112112

113113
The following parameters allow you to configure usage for Amazon SQS in the `s3` source plugin.
114114

115-
Option | Required | Type | Description
115+
Parameter | Required | Type | Description
116116
:--- | :--- | :--- | :---
117117
`queue_url` | Yes | String | The URL of the Amazon SQS queue from which messages are received.
118118
`maximum_messages` | No | Integer | The maximum number of messages to receive from the Amazon SQS queue in any single request. Default is `10`.
@@ -125,7 +125,7 @@ Option | Required | Type | Description
125125

126126
## aws
127127

128-
Option | Required | Type | Description
128+
Parameter | Required | Type | Description
129129
:--- | :--- | :--- | :---
130130
`region` | No | String | The AWS Region to use for credentials. Defaults to [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
131131
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
@@ -140,22 +140,30 @@ The `codec` determines how the `s3` source parses each Amazon S3 object. For inc
140140

141141
The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) processor to parse each line.
142142

143-
Use the following options to configure the `newline` codec.
143+
Use the following parameters to configure the `newline` codec.
144144

145-
Option | Required | Type | Description
145+
Parameter | Required | Type | Description
146146
:--- | :--- |:--------| :---
147147
`skip_lines` | No | Integer | The number of lines to skip before creating events. You can use this configuration to skip common header rows. Default is `0`.
148-
`header_destination` | No | String | A key value to assign to the header line of the S3 object. If this option is specified, then each event will contain a `header_destination` field.
148+
`header_destination` | No | String | A key value to assign to the header line of the S3 object. If this parameter is specified, then each event will contain a `header_destination` field.
149149

150150
### json codec
151151

152-
The `json` codec parses each S3 object as a single JSON object from a JSON array and then creates a Data Prepper log event for each object in the array.
152+
The `json` codec parses each S3 object as a single JSON object from a JSON array and then creates a Data Prepper log event for each object in the array.
153+
154+
Use the following parameters to configure the `json` codec.
155+
156+
Parameter | Required | Type | Description
157+
:--- | :--- |:--------| :---
158+
`key_name` | No | String | The name of the input field from which to extract the JSON array and create events.
153159

154160
### csv codec
155161

156-
The `csv` codec parses objects in comma-separated value (CSV) format, with each row producing a Data Prepper log event. Use the following options to configure the `csv` codec.
162+
The `csv` codec parses objects in comma-separated value (CSV) format, with each row producing a Data Prepper log event.
163+
164+
Use the following parameters to configure the `csv` codec.
157165

158-
Option | Required | Type | Description
166+
Parameters | Required | Type | Description
159167
:--- |:---------|:------------| :---
160168
`delimiter` | Yes | Integer | The delimiter separating columns. Default is `,`.
161169
`quote_character` | Yes | String | The character used as a text qualifier for CSV data. Default is `"`.
@@ -164,9 +172,9 @@ Option | Required | Type | Description
164172

165173
## Using `s3_select` with the `s3` source<a name="s3_select"></a>
166174

167-
When configuring `s3_select` to parse Amazon S3 objects, use the following options:
175+
When configuring `s3_select` to parse S3 objects, use the following parameters.
168176

169-
Option | Required | Type | Description
177+
Parameter | Required | Type | Description
170178
:--- |:-----------------------|:------------| :---
171179
`expression` | Yes, when using `s3_select` | String | The expression used to query the object. Maps directly to the [expression](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-Expression) property.
172180
`expression_type` | No | String | The type of the provided expression. Default value is `SQL`. Maps directly to the [ExpressionType](https://docs.aws.amazon.com/AmazonS3/latest/API/API_SelectObjectContent.html#AmazonS3-SelectObjectContent-request-ExpressionType).
@@ -177,28 +185,28 @@ Option | Required | Type | Description
177185

178186
### csv<a name="s3_select_csv"></a>
179187

180-
Use the following options in conjunction with the `csv` configuration for `s3_select` to determine how your parsed CSV file should be formatted.
188+
Use the following parameters in conjunction with the `csv` configuration for `s3_select` to determine how your parsed CSV file should be formatted.
181189

182-
These options map directly to options available in the S3 Select [CSVInput](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html) data type.
190+
These parameters map directly to inputs available in the S3 Select [CSVInput](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html) data type.
183191

184-
Option | Required | Type | Description
192+
Parameter | Required | Type | Description
185193
:--- |:---------|:------------| :---
186194
`file_header_info` | No | String | Describes the first line of input. Maps directly to the [FileHeaderInfo](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-FileHeaderInfo) property.
187195
`quote_escape` | No | String | A single character used for escaping the quotation mark character inside an already escaped value. Maps directly to the [QuoteEscapeCharacter](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-QuoteEscapeCharacter) property.
188196
`comments` | No | String | A single character used to indicate that a row should be ignored when the character is present at the start of that row. Maps directly to the [Comments](https://docs.aws.amazon.com/AmazonS3/latest/API/API_CSVInput.html#AmazonS3-Type-CSVInput-Comments) property.
189197

190198
#### json<a name="s3_select_json"></a>
191199

192-
Use the following option in conjunction with `json` for `s3_select` to determine how S3 Select processes the JSON file.
200+
Use the following parameters in conjunction with `json` for `s3_select` to determine how S3 Select processes the JSON file.
193201

194-
Option | Required | Type | Description
202+
Parameter | Required | Type | Description
195203
:--- | :--- | :--- | :---
196204
`type` | No | String | The type of JSON array. May be either `DOCUMENT` or `LINES`. Maps directly to the [Type](https://docs.aws.amazon.com/AmazonS3/latest/API/API_JSONInput.html#AmazonS3-Type-JSONInput-Type) property.
197205

198206
## Using `scan` with the `s3` source<a name="scan"></a>
199-
The following parameters allow you to scan S3 objects. All options can be configured at the bucket level.
207+
The following parameters allow you to scan S3 objects. All parameters can be configured at the bucket level.
200208

201-
Option | Required | Type | Description
209+
Parameter | Required | Type | Description
202210
:--- | :--- | :--- | :---
203211
`start_time` | No | String | The time from which to start scanning objects modified after the given `start_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `end_time` is configured along with `start_time`, all objects after `start_time` and before `end_time` will be processed. `start_time` and `range` cannot be used together.
204212
`end_time` | No | String | The time after which no objects will be scanned after the given `end_time`. This should follow [ISO LocalDateTime](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html#ISO_LOCAL_DATE_TIME) format, for example, `023-01-23T10:00:00`. If `start_time` is configured along with `end_time`, all objects after `start_time` and before `end_time` will be processed. `end_time` and `range` cannot be used together.
@@ -210,13 +218,13 @@ Option | Required | Type | Description
210218
### scan bucket
211219
<!-- vale on -->
212220

213-
Option | Required | Type | Description
221+
Parameter | Required | Type | Description
214222
:--- | :--- |:-----| :---
215-
`bucket` | Yes | Map | Provides options for each bucket.
223+
`bucket` | Yes | Map | Provides parameters for each bucket.
216224

217-
You can configure the following options in the `bucket` setting map.
225+
You can configure the following parameters in the `bucket` setting map.
218226

219-
Option | Required | Type | Description
227+
Parameter | Required | Type | Description
220228
:--- | :--- | :--- | :---
221229
`name` | Yes | String | The string representing the S3 bucket name to scan.
222230
`filter` | No | [Filter](#filter) | Provides the filter configuration.
@@ -226,16 +234,16 @@ Option | Required | Type | Description
226234

227235
### filter
228236

229-
Use the following options inside the `filter` configuration.
237+
Use the following parameters in the `filter` configuration.
230238

231-
Option | Required | Type | Description
239+
Parameter | Required | Type | Description
232240
:--- | :--- | :--- | :---
233241
`include_prefix` | No | List | A list of S3 key prefix strings included in the scan. By default, all the objects in a bucket are included.
234242
`exclude_suffix` | No | List | A list of S3 key suffix strings excluded from the scan. By default, no objects in a bucket are excluded.
235243

236244
### scheduling
237245

238-
Option | Required | Type | Description
246+
Parameter | Required | Type | Description
239247
:--- | :--- | :--- | :---
240248
`interval` | Yes | String | Indicates the minimum interval between each scan. The next scan in the interval will start after the interval duration from the last scan ends and when all the objects from the previous scan are processed. Supports ISO 8601 notation strings, such as `PT20.345S` or `PT15M`, and notation strings for seconds (`60s`) and milliseconds (`1600ms`).
241249
`count` | No | Integer | Specifies how many times a bucket will be scanned. Defaults to `Integer.MAX_VALUE`.

0 commit comments

Comments
 (0)