Skip to content

Commit 95f18c3

Browse files
san81kolchfa-awsnatebower
authored andcommitted
Atlassian sources added (opensearch-project#10739)
* Atlassian sources added Signed-off-by: Santhosh <[email protected]> * below to following Signed-off-by: Santhosh <[email protected]> * addressing review comments Signed-off-by: Santhosh <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Signed-off-by: Nathan Bower <[email protected]> * addressing review comments Signed-off-by: Santhosh <[email protected]> * Update _data-prepper/pipelines/configuration/sources/atlassian-confluence.md Signed-off-by: Nathan Bower <[email protected]> * Update _data-prepper/pipelines/configuration/sources/atlassian-confluence.md Signed-off-by: Nathan Bower <[email protected]> * fixing build issues Signed-off-by: Santhosh <[email protected]> --------- Signed-off-by: Santhosh <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: Nathan Bower <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
1 parent 0ac06dd commit 95f18c3

File tree

2 files changed

+296
-0
lines changed

2 files changed

+296
-0
lines changed
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
---
2+
layout: default
3+
title: Atlassian Confluence
4+
parent: Sources
5+
grand_parent: Pipelines
6+
nav_order: 5
7+
---
8+
9+
# Atlassian Confluence source
10+
11+
You can use the OpenSearch Data Prepper `confluence` source to ingest records from one or more [Atlassian Confluence](https://www.atlassian.com/software/confluence) spaces.
12+
13+
## Usage
14+
15+
Set up Confluence project access credentials by choosing one of the following options:
16+
17+
- **Basic authentication** (API key authentication): Follow [these instructions](https://support.atlassian.com/atlassian-account/docs/manage-api-tokens-for-your-atlassian-account/).
18+
- **OAuth2 authentication**: Follow [these instructions](https://developer.atlassian.com/cloud/jira/platform/oauth-2-3lo-apps/#faq-rrt-config).
19+
20+
As an additional optional step, store the credentials in AWS Secrets Manager. If you don't store the credentials in AWS Secrets Manager, then you must provide plain-text credentials directly in the pipeline configuration.
21+
22+
The following example pipeline specifies `confluence` as a source. The pipeline ingests data from multiple Confluence spaces named `space1` and `space2` and applies filters to select wiki content (pages and blog posts) from these projects as a source:
23+
24+
```yaml
25+
version: "2"
26+
extension:
27+
aws:
28+
secrets:
29+
confluence-account-credentials:
30+
secret_id: "arn:aws:secretsmanager:us-east-1:123456789012:secret:confluence-credentials-secret"
31+
region: "us-east-1"
32+
sts_role_arn: "arn:aws:iam::123456789012:role/Example-Role"
33+
atlassian-confluence-pipeline:
34+
source:
35+
confluence:
36+
hosts: ["https://example.atlassian.net/"]
37+
acknowledgments: true
38+
authentication:
39+
# Provide one of the authentication method to use. Supported methods are 'basic' and 'oauth2'.
40+
# For basic authentication, password is the API key that you generate using your confluence account
41+
basic:
42+
username: {% raw %} ${{aws_secrets:confluence-account-credentials:username}} {% endraw %}
43+
password: {% raw %} ${{aws_secrets:confluence-account-credentials:password}} {% endraw %}
44+
# For OAuth2 based authentication, we require the following 4 key values stored in the secret
45+
# Follow atlassian instructions at the following link to generate these keys
46+
# https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/
47+
# If you are using OAuth2 authentication, we also require, write permission to your aws secret to
48+
# be able to write the renewed tokens back into the secret
49+
# oauth2:
50+
# client_id: {% raw %} ${{aws_secrets:confluence-account-credentials:clientId}} {% endraw %}
51+
# client_secret: {% raw %} ${{aws_secrets:confluence-account-credentials:clientSecret}} {% endraw %}
52+
# access_token: {% raw %} ${{aws_secrets:confluence-account-credentials:accessToken}} {% endraw %}
53+
# refresh_token: {% raw %} ${{aws_secrets:confluence-account-credentials:refreshToken}} {% endraw %}
54+
filter:
55+
space:
56+
key:
57+
include:
58+
# This is not space name.
59+
# It is an alphanumeric space key that you can find under space details in confluence
60+
- "space1"
61+
- "space2"
62+
# exclude:
63+
# - "<<space key>>"
64+
# - "<<space key>>"
65+
page_type:
66+
include:
67+
- "page"
68+
# - "blogpost"
69+
# - "comment"
70+
# exclude:
71+
# - "attachment"
72+
```
73+
{% include copy.html %}
74+
75+
## Configuration options
76+
77+
The `confluence` source supports the following configuration options.
78+
79+
| Option | Required | Type | Description |
80+
|:------------------|:---------|:----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
81+
| `hosts` | Yes | List | The Atlassian Confluence hostname. Currently, only one host is supported, so this list is expected to be of size 1. |
82+
| `acknowledgments` | No | Boolean | When set to `true`, enables the `confluence` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) when events are received by OpenSearch sinks. |
83+
| `authentication` | Yes | [authentication](#Authentication) | Configures the authentication method used to access `confluence` source records from the specified host. |
84+
| `filter` | No | [filter](#Filter) | Applies specific filter criteria while extracting Confluence content. |
85+
86+
### Authentication
87+
88+
You can use one of the following authentication methods to access a Confluence host. You must provide one of the following parameters.
89+
90+
| Option | Required | Type | Description |
91+
|:---------|:---------|:------------------|:-------------------------------------------------------------|
92+
| `basic` | Yes | [basic](#basic-authentication) | Basic authentication credentials used to access a Confluence host. |
93+
| `oauth2` | Yes | [oauth2](#oauth2-authentication) | OAuth2 authentication credentials used to access a Confluence host. |
94+
95+
#### Basic authentication
96+
97+
Either basic or OAuth2 credentials are required to access the Confluence site. If you use `basic` authentication, the following fields are required.
98+
99+
| Option | Required | Type | Description |
100+
|:-----------|:---------|:-------|:------------------------------------------------------------------------------------------------|
101+
| `username` | Yes | String | A username or reference to the secret key storing the username. |
102+
| `password` | Yes | String | A password (API key) or reference to the secret key storing the password. |
103+
104+
#### OAuth2 authentication
105+
106+
Either basic or OAuth2 credentials are required to access the Confluence site. If you use OAuth2, the following fields are required.
107+
108+
| Option | Required | Type | Description |
109+
|:----------------|:---------|:-------|:------------------------------------------------------------------------------------------------|
110+
| `client_id` | Yes | String | A `client_id` or reference to the secret key storing the `client_id`. |
111+
| `client_secret` | Yes | String | A `client_secret` or reference to the secret key storing the `client_secret`. |
112+
| `access_token` | Yes | String | An `access_token` or reference to the secret key storing the `access_token`. |
113+
| `refresh_token` | Yes | String | A `refresh_token` or reference to the secret key storing the `refresh_token`. |
114+
115+
### Filter
116+
117+
Optionally, you can specify filters to select specific content, shown in the following table. If no filters are specified, all the spaces and content visible for the specified credentials are extracted and sent to the specified sink in the pipeline.
118+
119+
| Option | Required | Type | Description |
120+
|:------------|:---------|:-------|:----------------------------------------------|
121+
| `space` | No | String | A list of space keys to include or exclude. |
122+
| `page_type` | No | String | A list of page type filters to include or exclude. |
123+
124+
### AWS secrets
125+
126+
You can use the following options in the `aws` secrets configuration if you plan to store the credentials in AWS Secrets Manager. Storing secrets in AWS Secrets Manager is optional. If AWS Secrets Manager is not used, credentials must be specified in the pipeline YAML itself, in plain text.
127+
128+
If OAuth2 authentication is used in combination with `aws` secrets, this source requires write permissions to AWS Secrets Manager to be able to write back the updated (or renewed) access token once the current token expires.
129+
130+
| Option | Required | Type | Description |
131+
|:---------------|:---------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
132+
| `region` | Yes | String | The AWS Region to use for credentials. Defaults to the [standard SDK behavior for determining the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html). |
133+
| `sts_role_arn` | Yes | String | The AWS Security Token Service (AWS STS) role to assume for requests to Atlassian Confluence. Defaults to `null`, which uses the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). |
134+
| `secret_id` | Yes | Map | The Amazon Resource Name (ARN) of the secret where the credentials are stored. |
135+
136+
## Metrics
137+
138+
The `confluence` source includes the following metrics (counters):
139+
140+
* `crawlingTime`: The amount of time taken to crawl through all the new changes in Confluence.
141+
* `pageFetchLatency`: The page fetch API operation latency.
142+
* `searchCallLatency`: The search API operation latency.
143+
* `searchResultsFound`: The number of pages found in a specified search API call.

0 commit comments

Comments
 (0)