Skip to content

LeungMingTakMilton/ChineseNLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChineseNLP

NLP Engine that supports keyword extraction, topic modelling and sentiment analysis

1. Installation

pip install -r requirements.txt

2a. run.py usage (NLP CLI)

python run.py [-l|-n|-s|-t|-u] [input file] [outputPath]

Find files in data/output/ after execution

2b. main.py usage (NLP topic extraction server)

python main.py

Get key topics and tags in http://127.0.0.1/api/nlp?url=[Enter your article url here]

3. Example

a. Extract key topics and most common words from a document:

python run.py -lnt data/input/essay data/output/essay/

b. Extract key topics and most common words from url:

python run.py -lnstu https://theinitium.com/article/20160309-dailynews-alphago/ data/output/essay/

c. Building key topics and sentiment model from rthk local and international news

python sql.py -c local,international rthk_data 2016/01/01 2016/04/01 data/output/news

d. Building key topics and sentiment model from fso blog

python sql.py fso_blog_data 2016/01/01 2016/04/01 data/output/fso_blog

e. Building key topics and sentiment model from ceo blog

python sql.py ceo_blog_data 2016/01/01 2016/04/01 data/output/ceo_blog

4. Option Description

ngrams and frequency count

option: -n output files: [freq bigram trigram] file format: [word]: [count] ... [word]: [count]

sorted by count in descending order

tfidf

option: -t output file: score file format: [key word]: [score] ... [key word]: [score]

latent semantic indexing (LSI)

option: -l output file: topics file format: _[key word 1]: [score1, score2, score3, score4, score5, score6] [key word 2]: [score1, score2, score3, score4, score5, score6] ... [key word n]: [score1, score2, score3, score4, score5, score6]

sorted by score1, followed by score2, score3 ... and score6 in desending order

sentiment format

file format [article_1]: [url1]: [s1,s2,s3,s4,s5,s6,s7,s8] [article_2]: [url2]: [s1,s2,s3,s4,s5,s6,s7,s8] [article_3]: [url3]: [s1,s2,s3,s4,s5,s6,s7,s8] ... Overall: [s1,s2,s3,s4,s5,s6,s7,s8]

sentiment analysis using doc2vec and mlr

option: -s output file: sentiment file format: [sentence1]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence2]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence3]: [s1,s2,s3,s4,s5,s6,s7,s8] ... Overall: [s1,s2,s3,s4,s5,s6,s7,s8]

sentiment meaning: s1:實用 s5:無聊 s2:感人 s6:害怕 s3:開心 s7:難過 s4:有趣 s8:憤怒

score range: [0,1]

5. sql.py usage

python sql.py [-c category] [table] [start yyyy/mm/dd] [end yyyy/mm/dd] [output path]

6. qurey.sh usage

./query.sh [start date] [end date] [grep interval] [shift inerval] [table] [category (only for rthk_data)] Example:

./query.sh 2016/02/08 2016/04/04 1days 1days rthk_data local
./query.sh 2016/02/08 2016/04/04 1days 1days fso_blog_data
./query.sh 2016/02/08 2016/04/04 1months 1days ceo_blog_data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages