ChineseNLP

NLP Engine that supports keyword extraction, topic modelling and sentiment analysis

1. Installation

pip install -r requirements.txt

2a. run.py usage (NLP CLI)

python run.py [-l|-n|-s|-t|-u] [input file] [outputPath]

Find files in data/output/ after execution

2b. main.py usage (NLP topic extraction server)

python main.py

Get key topics and tags in http://127.0.0.1/api/nlp?url=[Enter your article url here]

3. Example

a. Extract key topics and most common words from a document:

python run.py -lnt data/input/essay data/output/essay/

b. Extract key topics and most common words from url:

python run.py -lnstu https://theinitium.com/article/20160309-dailynews-alphago/ data/output/essay/

c. Building key topics and sentiment model from rthk local and international news

python sql.py -c local,international rthk_data 2016/01/01 2016/04/01 data/output/news

d. Building key topics and sentiment model from fso blog

python sql.py fso_blog_data 2016/01/01 2016/04/01 data/output/fso_blog

e. Building key topics and sentiment model from ceo blog

python sql.py ceo_blog_data 2016/01/01 2016/04/01 data/output/ceo_blog

4. Option Description

ngrams and frequency count

option: -n output files: [freq bigram trigram] file format: [word]: [count] ... [word]: [count]

sorted by count in descending order

tfidf

option: -t output file: score file format: [key word]: [score] ... [key word]: [score]

latent semantic indexing (LSI)

option: -l output file: topics file format: _[key word 1]: [score1, score2, score3, score4, score5, score6] [key word 2]: [score1, score2, score3, score4, score5, score6] ... [key word n]: [score1, score2, score3, score4, score5, score6]

sorted by score1, followed by score2, score3 ... and score6 in desending order

sentiment format

file format [article_1]: [url1]: [s1,s2,s3,s4,s5,s6,s7,s8] [article_2]: [url2]: [s1,s2,s3,s4,s5,s6,s7,s8] [article_3]: [url3]: [s1,s2,s3,s4,s5,s6,s7,s8] ... Overall: [s1,s2,s3,s4,s5,s6,s7,s8]

sentiment analysis using doc2vec and mlr

option: -s output file: sentiment file format: [sentence1]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence2]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence3]: [s1,s2,s3,s4,s5,s6,s7,s8] ... Overall: [s1,s2,s3,s4,s5,s6,s7,s8]

sentiment meaning: s1:實用 s5:無聊 s2:感人 s6:害怕 s3:開心 s7:難過 s4:有趣 s8:憤怒

score range: [0,1]

5. sql.py usage

python sql.py [-c category] [table] [start yyyy/mm/dd] [end yyyy/mm/dd] [output path]

6. qurey.sh usage

./query.sh [start date] [end date] [grep interval] [shift inerval] [table] [category (only for rthk_data)] Example:

./query.sh 2016/02/08 2016/04/04 1days 1days rthk_data local
./query.sh 2016/02/08 2016/04/04 1days 1days fso_blog_data
./query.sh 2016/02/08 2016/04/04 1months 1days ceo_blog_data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChineseNLP

1. Installation

2a. run.py usage (NLP CLI)

2b. main.py usage (NLP topic extraction server)

3. Example

4. Option Description

ngrams and frequency count

tfidf

latent semantic indexing (LSI)

sentiment format

sentiment analysis using doc2vec and mlr

5. sql.py usage

6. qurey.sh usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data/input		data/input
model		model
src		src
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.py		run.py

LeungMingTakMilton/ChineseNLP

Folders and files

Latest commit

History

Repository files navigation

ChineseNLP

1. Installation

2a. run.py usage (NLP CLI)

2b. main.py usage (NLP topic extraction server)

3. Example

4. Option Description

ngrams and frequency count

tfidf

latent semantic indexing (LSI)

sentiment format

sentiment analysis using doc2vec and mlr

5. sql.py usage

6. qurey.sh usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages