Skip to content
This repository was archived by the owner on Dec 30, 2024. It is now read-only.

Commit 18d022f

Browse files
authored
Merge pull request #121 from aws-solutions/feature/v2.2.0
Update to version v2.2.0
2 parents 132cf6b + 3cc89f6 commit 18d022f

File tree

258 files changed

+50483
-136542
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

258 files changed

+50483
-136542
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ To get the version of the solution, you can look at the description of the creat
2424
- [ ] Region: [e.g. us-east-1]
2525
- [ ] Was the solution modified from the version published on this repository?
2626
- [ ] If the answer to the previous question was yes, are the changes available on GitHub?
27-
- [ ] Have you checked your [service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) for the sevices this solution uses?
27+
- [ ] Have you checked your [service quotas](https://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html) for the services this solution uses?
2828
- [ ] Were there any errors in the CloudWatch Logs?
2929

3030
**Screenshots**

CHANGELOG.md

+21
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,27 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [2.2.0] - 2023-09-21
9+
10+
### Updated
11+
- Migrated to AWS CDK v2
12+
- Migrated to AWS SDK V3
13+
- Updated Node Lambda runtime to Node 18
14+
- Implemented NewsCatcher Locally instead of using deprecated library
15+
- Reddit comments Ingestion - Migrated from npm snoowrap package to Python praw library. Subreddit comment ingestion lambda is now using Python runtime
16+
- Security patches for npm packages
17+
- Updated outdated libraries
18+
- Operational metrics to include additional deployment attributes for Reddit ingestion and attributes indicating if particular ingestion type is enabled
19+
20+
### Fixed
21+
- Youtube Search query when OR (|) expression is used in query parameter
22+
- Reddit comment ingestion issue for highly active subreddits
23+
- UrlLib issue in RssNewsFeed Ingestion Lambda
24+
25+
### Removed
26+
- Python NewCatcher library
27+
- npm snoowrap package
28+
829
## [2.1.4] - 2023-06-07
930

1031
### Updated

NOTICE.txt

+91-40
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,97 @@
11
Discovering Hot Topics using Machine Learning
2-
Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.
3-
4-
Licensed under the Apache License, Version 2.0 (the "License");
5-
you may not use this file except in compliance with the License.
6-
You may obtain a copy of the License at
7-
8-
http://www.apache.org/licenses/LICENSE-2.0
9-
10-
Unless required by applicable law or agreed to in writing, software
11-
distributed under the License is distributed on an "AS IS" BASIS,
12-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13-
See the License for the specific language governing permissions and
14-
limitations under the License.
2+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
3+
Licensed under the Apache License Version 2.0 (the "License"). You may not use this file except
4+
in compliance with the License. A copy of the License is located at http://www.apache.org/licenses/
5+
or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS,
6+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the License for the
7+
specific language governing permissions and limitations under the License.
158

169
**********************
1710
THIRD PARTY COMPONENTS
1811
**********************
1912
This software includes third party software subject to the following copyrights:
20-
AWS CDK - Apache-2.0
21-
AWS SDK - Apache-2.0
22-
AWS SDK Mock - Apache-2.0
23-
AWS Solutions Constructs Library - Apache-2.0
24-
boto3 - Apache-2.0
25-
botocore - Apache-2.0
26-
chai - MIT license
27-
crhelper - Apache-2.0
28-
googleapis - Apache-2.0
29-
jest - MIT license
30-
jmespath - MIT License
31-
momentjs - MIT license
32-
moto - Apache-2.0
33-
newscatcher - MIT license
34-
nock - MIT license
35-
node - MIT license
36-
openpyx - MIT license
37-
pytest-cov - MIT license
38-
pytest - MIT license
39-
requests - Apache-2.0
40-
sinon - BSD license
41-
snoowrap - MIT license
42-
tenacity - Apache-2.0
43-
ts-jest - MIT license
44-
ts-node - MIT license
45-
twitter-lite - MIT License
46-
typescript - Apache-2.0
13+
14+
@aws-cdk/aws-glue-alpha under Apache License 2.0
15+
@aws-cdk/aws-lambda-python-alpha under Apache License 2.0
16+
@aws-cdk/aws-servicecatalogappregistry-alpha under Apache License 2.0
17+
@aws-solutions-constructs/aws-eventbridge-lambda under Apache License 2.0
18+
@aws-solutions-constructs/aws-kinesisfirehose-s3 under Apache License 2.0
19+
@aws-solutions-constructs/aws-kinesisstreams-lambda under Apache License 2.0
20+
@aws-solutions-constructs/aws-lambda-dynamodb under Apache License 2.0
21+
@aws-solutions-constructs/aws-lambda-s3 under Apache License 2.0
22+
@aws-solutions-constructs/aws-lambda-stepfunctions under Apache License 2.0
23+
@aws-solutions-constructs/aws-sqs-lambda under Apache License 2.0
24+
@aws-solutions-constructs/core -Apache License 2.0
25+
attrs under MIT License
26+
aws-cdk-lib under Apache License 2.0
27+
aws-sdk-client-mock under MIT License
28+
aws-solutions-constructs under Apache License 2.0
29+
boolean.py under BSD-2-Clause
30+
boto3 under Apache License 2.0
31+
botocore under Apache License 2.0
32+
cachetools under MIT License
33+
cdk-nag under Apache License 2.0
34+
cffi under MIT License
35+
chai under MIT license
36+
constructs under Apache License 2.0
37+
coverage under Apache License 2.0
38+
crhelper under Apache License 2.0
39+
crypto under ISC
40+
cryptography under Apache Software License, BSD License (Apache License 2.0 OR BSD-3-Clause)
41+
et-xmlfile under MIT License
42+
exceptiongroup under MIT License
43+
feedparser under BSD License (BSD-2-Clause)
44+
googleapis under Apache License 2.0
45+
googleapis-common-protos under Apache License 2.0
46+
google-api-core under Apache 2.0
47+
google-api-python-client under Apache 2.0
48+
google-auth under Apache 2.0
49+
google-auth-httplib2 under Apache 2.0
50+
httplib2 under MIT License
51+
iniconfig under MIT License
52+
jest under MIT license
53+
jinja under BSD License (BSD-3-Clause)
54+
Jinja2 under BSD License (BSD-3-Clause)
55+
jmespath under MIT License
56+
license-expression under Apache License 2.0
57+
MarkupSafe under BSD License (BSD-3-Clause)
58+
momentjs under MIT license
59+
moto under Apache License 2.0
60+
newscatcher under MIT license
61+
nock under MIT license
62+
node under MIT license
63+
openpyx under MIT license
64+
openpyxl under MIT License
65+
pluggy under MIT License
66+
pytest-cov under MIT license
67+
pytest under MIT license
68+
praw under BSD License (BSD-2-Clause)
69+
prawcore under BSD License
70+
protobuf under BSD License (BSD-3-Clause)
71+
pyasn1 under BSD License (BSD-2-Clause)
72+
pyasn1-modules under BSD License
73+
pycparser under BSD License
74+
python-dateutil under Apache License 2.0, BSD License
75+
requests under Apache License 2.0
76+
requests-file under Apache 2.0
77+
responses under Apache 2.0
78+
rsa under Apache License 2.0
79+
s3transfer under Apache 2.0
80+
sgmllib3k under BSD License
81+
sinon under BSD license
82+
source-map-support under MIT License
83+
tenacity under Apache License 2.0
84+
tldextract under BSD License (BSD-3-Clause)
85+
tomli under MIT License
86+
types-PyYAML under Apache License 2.0
87+
ts-jest under MIT license
88+
ts-node under MIT license
89+
twitter-lite under MIT License
90+
@types/uuid under MIT License
91+
typescript under Apache License 2.0
92+
update-checker under BSD License
93+
uritemplate under Apache Software License, BSD License (BSD 3-Clause License or Apache License, Version 2.0)
94+
websocket-client under Apache License 2.0
95+
Werkzeug under BSD License
96+
xmltodict under MIT License
97+

README.md

+25-20
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ This solution deploys an AWS CloudFormation template to automate data ingestion
1010
- Reddit (comments from subreddits of interest)
1111
- custom data in JSON or XLSX format
1212

13+
**Note**: Twitter ingestion is temporarily disabled starting release v2.2.0 as Twitter has retired v1 APIs.
14+
1315
This solution uses pre-trained machine learning (ML) models from Amazon Comprehend, Amazon Translate, and Amazon Rekognition to provide these benefits:
1416

1517
- **Detecting dominant topics using topic modeling**-identifies the terms that collectively form a topic.
@@ -19,7 +21,7 @@ This solution uses pre-trained machine learning (ML) models from Amazon Comprehe
1921

2022
The solution can be customized to aggregate other social media platforms and internal enterprise systems. The default CloudFormation deployment sets up custom ingestion configuration with parameters and an Amazon Simple Storage Service (Amazon S3) bucket to allow Amazon Transcribe Call Analytics output to be processed for natural language processing (NLP) analysis.
2123

22-
With minimal configuration changes in the custom ingestion functionality, this solution can ingest data from both internal systems and external data sources, such as transcriptions from call center calls, product reviews, movie reviews, and community chat forums including Twitch and Discord. This is done by exporting the custom data in JSON or XLSX format from the respective platforms and then uploading it to an Amazon Simple Storage Service (Amazon S3) bucket that is created when deploying this solution. More details on how to customize this feature, please refer Customizing Amazon Amazon S3 ingestion.
24+
With minimal configuration changes in the custom ingestion functionality, this solution can ingest data from both internal systems and external data sources, such as transcriptions from call center calls, product reviews, movie reviews, and community chat forums including Twitch and Discord. This is done by exporting the custom data in JSON or XLSX format from the respective platforms and then uploading it to an Amazon Simple Storage Service (Amazon S3) bucket that is created when deploying this solution. More details on how to customize this feature, please refer [Customizing Amazon S3 ingestion](https://docs.aws.amazon.com/solutions/latest/discovering-hot-topics-using-machine-learning/s3-ingestion.html).
2325

2426
For a detailed solution deployment guide, refer to [Discovering Hot Topics using Machine Learning](https://aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning)
2527

@@ -67,7 +69,7 @@ After you deploy the solution, use the included Amazon QuickSight dashboard to v
6769
- aws-kinesisstreams-lambda
6870
- aws-lambda-dynamodb
6971
- aws-lambda-s3
70-
- aws-lambda-step-function
72+
- aws-lambda-stepfunctions
7173
- aws-sqs-lambda
7274

7375
## Deployment
@@ -91,7 +93,8 @@ The solution is deployed using a CloudFormation template with a lambda backed cu
9193
│ ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Streams]
9294
│ ├── ingestion-custom [lambda function that reads files from Amazon S3 bucket and pushes data to Amazon Kinesis Data Streams]
9395
│ ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
94-
│ ├── ingestion-reddit [lambda function that makes Reddit API call to retrieve comments from subreddits of interest and pushes data to Amazon Kinesis Data Stream]
96+
│ ├── ingestion-publish-subreddit [lambda function that publishes Eventbridge (CloudWatch) events for the subreddits to ingest information from. This event triggers ingestion_reddit_comments lambda which retrieves comments from subreddit]
97+
│ ├── ingestion_reddit_comments [lambda function that makes Reddit API call to retrieve comments from subreddits of interest and pushes data to Amazon Kinesis Data Stream]
9598
│ ├── ingestion-youtube [lambda function that ingests comments from YouTube videos and pushes data to Amazon Kinesis Data Streams]
9699
│ ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
97100
│ ├── layers [lambda layer function library for Node and Python layers]
@@ -143,30 +146,26 @@ chmod +x ./run-all-tests.sh
143146
./run-all-tests.sh
144147
```
145148

146-
- Configure the bucket name of your target Amazon S3 distribution bucket
149+
- Configure environment variables for build
147150

151+
Configure below environment variables. Note: The values provided below are example values only.
148152
```
149-
export DIST_OUTPUT_BUCKET=my-bucket-name
150-
export VERSION=my-version
153+
export DIST_OUTPUT_BUCKET=my-bucket-name #This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
154+
155+
export SOLUTION_NAME=discovering-hot-topics-using-machine-learning #The name of this solution
156+
export VERSION=my-version #version number for the customized code
157+
export CF_TEMPLATE_BUCKET_NAME=my-cf-template-bucket-name #The name of the S3 bucket where the CloudFormation templates should be uploaded
158+
export QS_TEMPLATE_ACCOUNT=aws-account-id #The AWS account Id from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
159+
export DIST_QUICKSIGHT_NAMESPACE=my-quicksight-namespace #Quicksight namespace
151160
```
152161

153-
- Now build the distributable:
162+
- Run below commands to build the distributable:
154163

155164
```
156165
cd <rootDir>/deployment
157166
chmod +x ./build-s3-dist.sh
158-
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION $CF_TEMPLATE_BUCKET_NAME QS_TEMPLATE_ACCOUNT
159-
160-
```
167+
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION $CF_TEMPLATE_BUCKET_NAME $QS_TEMPLATE_ACCOUNT $DIST_QUICKSIGHT_NAMESPACE
161168
162-
- Parameter details
163-
164-
```
165-
$DIST_OUTPUT_BUCKET - This is the global name of the distribution. For the bucket name, the AWS Region is added to the global name (example: 'my-bucket-name-us-east-1') to create a regional bucket. The lambda artifact should be uploaded to the regional buckets for the CloudFormation template to pick it up for deployment.
166-
$SOLUTION_NAME - The name of This solution (example: discovering-hot-topics-using-machine-learning)
167-
$VERSION - The version number of the change
168-
$CF_TEMPLATE_BUCKET_NAME - The name of the S3 bucket where the CloudFormation templates should be uploaded
169-
$QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
170169
```
171170

172171
- When creating and using buckets it is recommeded to:
@@ -175,13 +174,19 @@ $QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates sh
175174
- Ensure buckets are not public.
176175
- Verify bucket ownership prior to uploading templates or code artifacts.
177176

177+
### 3. Upload deployment assets to your Amazon S3 buckets
178178
- Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
179179

180180
```
181-
aws s3 cp ./global-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
182-
aws s3 cp ./regional-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
181+
aws s3 cp ./global-s3-assets/ s3://$CF_TEMPLATE_BUCKET_NAME/discovering-hot-topics-using-machine-learning/$VERSION/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
182+
aws s3 cp ./regional-s3-assets/ s3://$DIST_OUTPUT_BUCKET-<aws_region>/discovering-hot-topics-using-machine-learning/$VERSION/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
183183
```
184184

185+
### 4. Launch the CloudFormation template
186+
- Get the link of the template uploaded to Amazon S3 bucket ($CF_TEMPLATE_BUCKET_NAME bucket from previous step)
187+
- Deploy the solution to your account by launching a new AWS CloudFormation stack
188+
189+
185190
## Collection of operational metrics
186191

187192
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the [implementation guide](https://docs.aws.amazon.com/solutions/latest/discovering-hot-topics-using-machine-learning/operational-metrics.html).

0 commit comments

Comments
 (0)