NewsClassification

What is it about?

This is a project aimed at training and evaluating various classifiers on news articles collected from NU.nl, to then predict their popularity as expressed in number of comments.

What are its components?

crawling contains a script to collect articles from the news site, save them to a database, and update them with the number of comments they have received.
preprocessing contains a script for preprocessing all text in the collected articles.
learning contains scripts to transform the collected data into input for the classifiers, and a script to train and evaluate classifiers on the data.

What about results?

Currently, when trained on a thousand articles, the multinomial Naive Bayes classifier can classify 50% of the articles correctly while the linear Support Vector Machine scores around 48%.

What next?

Some of the ideas for trying to improve classification performance are:

Collecting more data
Applying feature selection
Investigating the effects of training the classifiers with different parameters

Further details?

See the wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
crawling		crawling
learning		learning
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsClassification

What is it about?

What are its components?

What about results?

What next?

Further details?

About

Releases

Packages

Contributors 2

Languages

ercanse/NewsClassification

Folders and files

Latest commit

History

Repository files navigation

NewsClassification

What is it about?

What are its components?

What about results?

What next?

Further details?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages