This project is a simple information retrieval system that allows users to search for documents based on keywords. The system uses a vector space model to represent documents as vectors and calculate the cosine similarity between the query vector and document vectors. The system also uses a simple inverted index to speed up the search process and simple linear search model as control.
This project does not demand any additional libraries in python. The only thing you need to do is to clone the repository and run the simple_information_retrival.py file as the usage specifies the command line arguments.
The system can be run from the command line using the following command to list a few:
- python simple_information_retrival.py --extract-collection aesopa10.txt
- python simple_information_retrival.py --query "somesearchterm" --model "vector/bool" --search-mode "inverted/linear" --documents "original/no_stopwords" --stemming
- python simple_information_retrival.py --query "somesearchterm" --model "vector/bool" --search-mode "inverted/linear" --documents "original/no_stopwords"
python simple_information_retrival.py --query "somesearchterm" --model "vector" --documents "original/no_stopwords"