|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: rds |
| 4 | +parent: Sources |
| 5 | +grand_parent: Pipelines |
| 6 | +nav_order: 95 |
| 7 | +--- |
| 8 | + |
| 9 | +# rds |
| 10 | + |
| 11 | +The `rds` source enables change data capture (CDC) on [Amazon Relational Database Service (Amazon RDS)](https://aws.amazon.com/rds/) and [Amazon Aurora](https://aws.amazon.com/aurora/) databases. It can receive database events, such as `INSERT`, `UPDATE`, or `DELETE`, using database replication logs and supports initial load using RDS exports to Amazon Simple Storage Service (Amazon S3). |
| 12 | + |
| 13 | +The source supports the following database engines: |
| 14 | +- Aurora MySQL and Aurora PostgreSQL |
| 15 | +- RDS MySQL and RDS PostgreSQL |
| 16 | + |
| 17 | +The source includes two ingestion options for ingesting data from Aurora/RDS: |
| 18 | + |
| 19 | +1. Export: A full initial export from Aurora/RDS to S3 gets an initial load of the current state of the Aurora/RDS database. |
| 20 | +2. Stream: Stream events from database replication logs (MySQL binlog or PostgreSQL WAL). |
| 21 | + |
| 22 | +## Usage |
| 23 | + |
| 24 | +The following example pipeline specifies an `rds` source. It ingests data from an Aurora MySQL cluster: |
| 25 | + |
| 26 | +```yaml |
| 27 | +version: "2" |
| 28 | +rds-pipeline: |
| 29 | + source: |
| 30 | + rds: |
| 31 | + db_identifier: "my-rds-instance" |
| 32 | + engine: "aurora-mysql" |
| 33 | + database: "mydb" |
| 34 | + authentication: |
| 35 | + username: "myuser" |
| 36 | + password: "mypassword" |
| 37 | + s3_bucket: "my-export-bucket" |
| 38 | + s3_region: "us-west-2" |
| 39 | + s3_prefix: "rds-exports" |
| 40 | + export: |
| 41 | + kms_key_id: "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012" |
| 42 | + export_role_arn: "arn:aws:iam::123456789012:role/rds-export-role" |
| 43 | + stream: true |
| 44 | + aws: |
| 45 | + region: "us-west-2" |
| 46 | + sts_role_arn: "arn:aws:iam::123456789012:role/my-pipeline-role" |
| 47 | +``` |
| 48 | +
|
| 49 | +## Configuration options |
| 50 | +
|
| 51 | +The following tables describe the configuration options for the `rds` source. |
| 52 | + |
| 53 | +Option | Required | Type | Description |
| 54 | +:--- | :--- | :--- | :--- |
| 55 | +`db_identifier` | Yes | String | The identifier for the RDS instance or Aurora cluster. |
| 56 | +`cluster` | No | Boolean | Whether the `db_identifier` refers to a cluster (`true`) or an instance (`false`). Default is `false`. For Aurora engines, this option is always `true`. |
| 57 | +`engine` | Yes | String | The database engine type. Must be one of `mysql`, `postgresql`, `aurora-mysql`, or `aurora-postgresql`. |
| 58 | +`database` | Yes | String | The name of the database to connect to. |
| 59 | +`tables` | No | Object | The configuration for specifying which tables to include or exclude. See [tables](#tables) for more information. |
| 60 | +`authentication` | Yes | Object | Database authentication credentials. See [authentication](#authentication) for more information. |
| 61 | +`aws` | Yes | Object | The AWS configuration. See [aws](#aws) for more information. |
| 62 | +`acknowledgments` | No | Boolean | When `true`, enables the source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `true`. |
| 63 | +`s3_data_file_acknowledgment_timeout` | No | Duration | The amount of time that elapses before the data read from an RDS export expires when used with acknowledgments. Default is 30 minutes. |
| 64 | +`stream_acknowledgment_timeout` | No | Duration | The amount of time that elapses before the data read from database streams expires when used with acknowledgments. Default is 10 minutes. |
| 65 | +`s3_bucket` | Yes | String | The name of the S3 bucket in which RDS export data will be stored. |
| 66 | +`s3_prefix` | No | String | The prefix for S3 objects in the export bucket. |
| 67 | +`s3_region` | No | String | The AWS Region for the S3 bucket. If not specified, uses the same Region as specified in the [aws](#aws) configuration. |
| 68 | +`partition_count` | No | Integer | The number of folder partitions in the S3 buffer. Must be between 1 and 1,000. Default is 100. |
| 69 | +`export` | No | Object | The configuration for RDS export operations. See [export](#export-options) for more information. |
| 70 | +`stream` | No | Boolean | Whether to enable streaming of database change events. Default is `false`. |
| 71 | +`tls` | No | Object | The TLS configuration for database connections. See [tls](#tls-options) for more information. |
| 72 | +`disable_s3_read_for_leader` | No | Boolean | Whether to disable S3 read operations for the leader node. Default is `false`. |
| 73 | + |
| 74 | +### aws |
| 75 | + |
| 76 | +Use the following options in the AWS configuration. |
| 77 | + |
| 78 | +Option | Required | Type | Description |
| 79 | +:--- | :--- | :--- | :--- |
| 80 | +`region` | No | String | The AWS Region to use for credentials. Defaults to the [standard SDK behavior for determining the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). |
| 81 | +`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon RDS and Amazon S3. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). |
| 82 | +`sts_external_id` | No | String | The external ID to use when assuming the STS role. Must be between 2 and 1,224 characters. |
| 83 | +`sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes for the source plugin. Maximum of 5 headers. |
| 84 | + |
| 85 | +### authentication |
| 86 | + |
| 87 | +Use the following options for database authentication. |
| 88 | + |
| 89 | +Option | Required | Type | Description |
| 90 | +:--- | :--- | :--- | :--- |
| 91 | +`username` | Yes | String | The database username for authentication. |
| 92 | +`password` | Yes | String | The database password for authentication. |
| 93 | + |
| 94 | +### tables |
| 95 | + |
| 96 | +Use the following options to specify which tables to include in the data capture. |
| 97 | + |
| 98 | +Option | Required | Type | Description |
| 99 | +:--- | :--- | :--- | :--- |
| 100 | +`include` | No | List | A list of table names to include in data capture. Maximum of 1,000 tables. If specified, only these tables will be processed. |
| 101 | +`exclude` | No | List | A list of table names to exclude from data capture. Maximum of 1,000 tables. These tables will be ignored even if they match include patterns. |
| 102 | + |
| 103 | +### export options |
| 104 | + |
| 105 | +The following options let you customize the RDS export functionality. |
| 106 | + |
| 107 | +Option | Required | Type | Description |
| 108 | +:--- | :--- | :--- | :--- |
| 109 | +`kms_key_id` | Yes | String | The AWS Key Management Service (AWS KMS) key ID or Amazon Resource Name (ARN) to use for encrypting the export data. |
| 110 | +`export_role_arn` | Yes | String | The ARN of the IAM role that RDS will assume to perform the export operation. |
| 111 | + |
| 112 | +### tls options |
| 113 | + |
| 114 | +The following options let you configure TLS for database connections. |
| 115 | + |
| 116 | +Option | Required | Type | Description |
| 117 | +:--- | :--- | :--- | :--- |
| 118 | +`insecure` | No | Boolean | Whether to disable TLS encryption for database connections. Default is `false` (TLS enabled). |
| 119 | + |
| 120 | +## Exposed metadata attributes |
| 121 | + |
| 122 | +The following metadata will be added to each event that is processed by the `rds` source. These metadata attributes can be accessed using the [expression syntax `getMetadata` function]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/). |
| 123 | + |
| 124 | +* `primary_key`: The primary key of the database record. For tables with composite primary keys, values are concatenated with a `|` separator. |
| 125 | +* `event_timestamp`: The timestamp, in epoch milliseconds, of when the database change occurred. For export events, this represents the export time. For stream events, this represents the transaction commit time. |
| 126 | +* `document_version`: A long integer number generated from the event timestamp to use as the document version. |
| 127 | +* `opensearch_action`: The bulk action that will be used to send the event to OpenSearch, such as `index` or `delete`. |
| 128 | +* `change_event_type`: The stream event type. Can be `insert`, `update`, or `delete`. |
| 129 | +* `table_name`: The name of the database table from which the event originated. |
| 130 | +* `schema_name`: The name of the schema from which the event originated. For MySQL, `schema_name` is the same as `database_name`. |
| 131 | +* `database_name`: The name of the database from which the event originated. |
| 132 | +* `ingestion_type`: Indicates whether the event originated from an export or stream. Valid values are `EXPORT` and `STREAM`. |
| 133 | +* `s3_partition_key`: Events are stored in an S3 staging bucket before processing. This metadata indicates the location in the S3 bucket where the event is stored before processing. |
| 134 | + |
| 135 | +## Permissions |
| 136 | + |
| 137 | +The following are the required permissions for running RDS as a source: |
| 138 | + |
| 139 | +```json |
| 140 | +{ |
| 141 | + "Version": "2012-10-17", |
| 142 | + "Statement": [ |
| 143 | + { |
| 144 | + "Sid": "allowReadingFromS3Buckets", |
| 145 | + "Effect": "Allow", |
| 146 | + "Action": [ |
| 147 | + "s3:GetObject", |
| 148 | + "s3:DeleteObject", |
| 149 | + "s3:GetBucketLocation", |
| 150 | + "s3:ListBucket", |
| 151 | + "s3:PutObject" |
| 152 | + ], |
| 153 | + "Resource": [ |
| 154 | + "arn:aws:s3:::s3_bucket", |
| 155 | + "arn:aws:s3:::s3_bucket/*" |
| 156 | + ] |
| 157 | + }, |
| 158 | + { |
| 159 | + "Sid": "AllowDescribeInstances", |
| 160 | + "Effect": "Allow", |
| 161 | + "Action": [ |
| 162 | + "rds:DescribeDBInstances" |
| 163 | + ], |
| 164 | + "Resource": [ |
| 165 | + "arn:aws:rds:region:account-id:db:*" |
| 166 | + ] |
| 167 | + }, |
| 168 | + { |
| 169 | + "Sid": "AllowDescribeClusters", |
| 170 | + "Effect": "Allow", |
| 171 | + "Action": [ |
| 172 | + "rds:DescribeDBClusters" |
| 173 | + ], |
| 174 | + "Resource": [ |
| 175 | + "arn:aws:rds:region:account-id:cluster:*" |
| 176 | + ] |
| 177 | + }, |
| 178 | + { |
| 179 | + "Sid": "AllowSnapshots", |
| 180 | + "Effect": "Allow", |
| 181 | + "Action": [ |
| 182 | + "rds:DescribeDBClusterSnapshots", |
| 183 | + "rds:CreateDBClusterSnapshot", |
| 184 | + "rds:DescribeDBSnapshots", |
| 185 | + "rds:CreateDBSnapshot", |
| 186 | + "rds:AddTagsToResource" |
| 187 | + ], |
| 188 | + "Resource": [ |
| 189 | + "arn:aws:rds:region:account-id:cluster:*", |
| 190 | + "arn:aws:rds:region:account-id:cluster-snapshot:*", |
| 191 | + "arn:aws:rds:region:account-id:db:*", |
| 192 | + "arn:aws:rds:region:account-id:snapshot:*" |
| 193 | + ] |
| 194 | + }, |
| 195 | + { |
| 196 | + "Sid": "AllowExport", |
| 197 | + "Effect": "Allow", |
| 198 | + "Action": [ |
| 199 | + "rds:StartExportTask" |
| 200 | + ], |
| 201 | + "Resource": [ |
| 202 | + "arn:aws:rds:region:account-id:cluster:*", |
| 203 | + "arn:aws:rds:region:account-id:cluster-snapshot:*", |
| 204 | + "arn:aws:rds:region:account-id:snapshot:*" |
| 205 | + ] |
| 206 | + }, |
| 207 | + { |
| 208 | + "Sid": "AllowDescribeExports", |
| 209 | + "Effect": "Allow", |
| 210 | + "Action": [ |
| 211 | + "rds:DescribeExportTasks" |
| 212 | + ], |
| 213 | + "Resource": "*" |
| 214 | + }, |
| 215 | + { |
| 216 | + "Sid": "AllowAccessToKmsForExport", |
| 217 | + "Effect": "Allow", |
| 218 | + "Action": [ |
| 219 | + "kms:Decrypt", |
| 220 | + "kms:Encrypt", |
| 221 | + "kms:DescribeKey", |
| 222 | + "kms:RetireGrant", |
| 223 | + "kms:CreateGrant", |
| 224 | + "kms:ReEncrypt*", |
| 225 | + "kms:GenerateDataKey*" |
| 226 | + ], |
| 227 | + "Resource": [ |
| 228 | + "arn:aws:kms:region:account-id:key/export-key-id" |
| 229 | + ] |
| 230 | + }, |
| 231 | + { |
| 232 | + "Sid": "AllowPassingExportRole", |
| 233 | + "Effect": "Allow", |
| 234 | + "Action": "iam:PassRole", |
| 235 | + "Resource": [ |
| 236 | + "arn:aws:iam::account-id:role/export-role" |
| 237 | + ] |
| 238 | + } |
| 239 | + ] |
| 240 | +} |
| 241 | +``` |
| 242 | + |
| 243 | +## Metrics |
| 244 | + |
| 245 | +The `rds` source includes the following metrics: |
| 246 | + |
| 247 | +* `exportJobSuccess`: The number of RDS export tasks that have succeeded. |
| 248 | +* `exportJobFailure`: The number of RDS export tasks that have failed. |
| 249 | +* `exportS3ObjectsTotal`: The total number of export data files found in S3. |
| 250 | +* `exportS3ObjectsProcessed`: The total number of export data files that have been processed successfully from S3. |
| 251 | +* `exportS3ObjectsErrors`: The total number of export data files that have failed to be processed from S3. |
| 252 | +* `exportRecordsTotal`: The total number of records found in the export. |
| 253 | +* `exportRecordsProcessed`: The total number of export records that have been processed successfully. |
| 254 | +* `exportRecordsProcessingErrors`: The number of export record processing errors. |
| 255 | +* `changeEventsProcessed`: The number of change events processed from database streams. |
| 256 | +* `changeEventsProcessingErrors`: The number of processing errors for change events from database streams. |
| 257 | +* `bytesReceived`: The total number of bytes received by the source. |
| 258 | +* `bytesProcessed`: The total number of bytes processed by the source. |
| 259 | +* `positiveAcknowledgementSets`: The number of acknowledgement sets that are positively acknowledged in stream processing. |
| 260 | +* `negativeAcknowledgementSets`: The number of acknowledgement sets that are negatively acknowledged in stream processing. |
| 261 | +* `checkpointCount`: The total number of checkpoints in stream processing. |
| 262 | +* `noDataExtendLeaseCount`: The number of times that the lease is extended on a partition with no new data processed since the last checkpoint. |
| 263 | +* `giveupPartitionCount`: The number of times a partition is given up. |
| 264 | +* `replicationLogEntryProcessingTime`: The time taken to process a replication log event. |
| 265 | +* `replicationLogEntryProcessingErrors`: The number of replication log events that have failed to process. |
0 commit comments