Text-Mining

This code can be used to assign keywords to documents and find association rules between words from database of documents. Further, with little modifications one can create a document suggestion system using search keywords.

Getting Started

Clone this repository
Execute textMining.py
You will be asked support and confidence value. Ones you enter those, you'll get the association rules as output.
That's pretty much it. Good Job!

Prerequisites

Need to have python 3.6 installed on your machine.

Running the tests

The code is written in such a way that when you execute TextMining.py, it will check for the folder named documentDatabase and read all the .txt files in it. Each text file acts as a separate document. Since the input of the code should be database of documents, we have multiple documents in documentDatabase folder.
Ones all the documents are read, they are cleaned by removing stop words. A word is further cleaned using stemming. A list of stop words can be found in listOfStopWords.txt

Example of stemming: fill, filled, filling can be interpreted as fill

Further, each document is assigned few keywords using tf-idf algorithm. Keywords are written in a file named aprioriInput.txt At last Apriori Algorithm takes on the work. It reads aprioriInput.txt and generate association rules based on Minimum Support and Minimum Confidence
Minimum Support: A minimum support is applied to find all frequent itemsets in a database.
Minimum Confidence: A minimum confidence is applied to these frequent itemsets in order to form rules.

Built With

Python 3.6

Fork the repo and try to come up with some optimized version of the algorithm.

Author

Jeet Patel

Social

It is crucial to stay social ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Mining

Getting Started

Prerequisites

Running the tests

Built With

Author

Social

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
documentDatabase		documentDatabase
README.md		README.md
TextMining.py		TextMining.py
aprioriInput.txt		aprioriInput.txt
listOfStopWords.txt		listOfStopWords.txt

MrPatel95/Text-Mining

Folders and files

Latest commit

History

Repository files navigation

Text-Mining

Getting Started

Prerequisites

Running the tests

Built With

Author

Social

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages