Academic-Search-Engine

Created a search engine for users to find resources of academic references.
Applied hidden Markov statistical model, Inverted-Index and TF-IDF to process corpus.
Construct a Convolutional Neural Network with TensorFlow and Python to classify academic reference based on labeled corpus and achieved accuracy of 95.3%.
Ranked A+ as the top %1 course final project.

Search engine is a system that collects information from the Internet according to a certain strategy and using a specific computer program, organizes and processes the information， provides retrieval services for users, and displays the relevant information to users. Search engine is a retrieval technology which works on the Internet.

This paper focuses on the processing of corpus. The main research contents can be summarized as follows: 1.A hidden Markov statistical model is built, and Viterbi algorithm is used to segment news corpus; 2. An inverted index is established to store the mapping of the position of a word in a document or a group of documents under full-text search, so that the list of related documents can be quickly obtained according to the words; 3.A TFIDF model is built to evaluate the importance of words to a document set or one of the documents in a corpus, and the document score is given. 4. Snownlp is called for keyword sentence extraction; 5. The convolutional neural network model (CNN) is built, and the existing corpus is used to train the model. Finally,the model is used to classify the given text. 6.Pyqt5 is called and thread technology is used to display the interface;

On this basis, the future will further optimize the model and neural network, so as to carry out more accurate information retrieval and text classification.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
__pycache__		__pycache__
README.md		README.md
classification_corpus_800_new.txt		classification_corpus_800_new.txt
classification_corpus_800_new_result.txt		classification_corpus_800_new_result.txt
hmo_plus.py		hmo_plus.py
ir_corpus_1000_shuffled_new.txt		ir_corpus_1000_shuffled_new.txt
pku_training.utf8		pku_training.utf8
result.py		result.py
result.ui		result.ui
search.py		search.py
search.ui		search.ui
stoplist_utf8.txt		stoplist_utf8.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Academic-Search-Engine

About

Releases

Packages

Languages

Hoaru/Academic-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Academic-Search-Engine

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages