Skip to content

Hoaru/Academic-Search-Engine

Repository files navigation

Academic-Search-Engine

  • Created a search engine for users to find resources of academic references.
  • Applied hidden Markov statistical model, Inverted-Index and TF-IDF to process corpus.
  • Construct a Convolutional Neural Network with TensorFlow and Python to classify academic reference based on labeled corpus and achieved accuracy of 95.3%.
  • Ranked A+ as the top %1 course final project.

Search engine is a system that collects information from the Internet according to a certain strategy and using a specific computer program, organizes and processes the information, provides retrieval services for users, and displays the relevant information to users. Search engine is a retrieval technology which works on the Internet.

This paper focuses on the processing of corpus. The main research contents can be summarized as follows: 1.A hidden Markov statistical model is built, and Viterbi algorithm is used to segment news corpus; 2. An inverted index is established to store the mapping of the position of a word in a document or a group of documents under full-text search, so that the list of related documents can be quickly obtained according to the words; 3.A TFIDF model is built to evaluate the importance of words to a document set or one of the documents in a corpus, and the document score is given. 4. Snownlp is called for keyword sentence extraction; 5. The convolutional neural network model (CNN) is built, and the existing corpus is used to train the model. Finally,the model is used to classify the given text. 6.Pyqt5 is called and thread technology is used to display the interface;

On this basis, the future will further optimize the model and neural network, so as to carry out more accurate information retrieval and text classification.