ChineseNLP

In a nutshell

Extract key topics and most common words from a document:

python run.py -lnst data/input/essay data/output/essay/

Extract key topics and most common words from url:

python run.py -lnstu https://theinitium.com/article/20160309-dailynews-alphago/ data/output/essay/

run.py usage

python run.py [-l|-n|-s|-t|-u] [input file] [outputPath]

Dependencies

gensim jieba sklearn numpy readability BeautifulSoup

ngrams and frequency count

option: -n output files: [freq bigram trigram] file format: [word]: [count] ... [word]: [count]

sorted by count in descending order

tfidf

option: -t output file: score file format: [key word]: [score] ... [key word]: [score]

sorted by score in descending order

latent semantic indexing (LSI)

option: -l output file: topics file format: [key word]: [score1, score2, score3] ... [key word]: [score1, score2, score3]

sorted by score1, followed by score2 and score3 in desending order score range: [0,1]

sentiment analysis using doc2vec and mlr

option: -s output file: sentiment file format: [sentence1]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence2]: [s1,s2,s3,s4,s5,s6,s7,s8] [sentence3]: [s1,s2,s3,s4,s5,s6,s7,s8] ... Overall: [s1,s2,s3,s4,s5,s6,s7,s8]

sentiment meaning: s1:實用 s5:無聊 s2:感人 s6:害怕 s3:開心 s7:難過 s4:有趣 s8:憤怒

score range: [0,1]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
internPresent		internPresent
model		model
src		src
.gitignore		.gitignore
README.md		README.md
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChineseNLP

In a nutshell

run.py usage

Dependencies

ngrams and frequency count

tfidf

latent semantic indexing (LSI)

sentiment analysis using doc2vec and mlr

About

Releases

Packages

Languages

stchau4work/ChineseNLP

Folders and files

Latest commit

History

Repository files navigation

ChineseNLP

In a nutshell

run.py usage

Dependencies

ngrams and frequency count

tfidf

latent semantic indexing (LSI)

sentiment analysis using doc2vec and mlr

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages