Allen_AI_Science_Challenge_JunweiPan

1. Information Retrieval Based Method

We build a local search engine, then use the question as query to get the top results. Then we calculate the word count of the answers in these top results, the answers that get the highest word count is choosed as the prediction.

Data Preparation

We use the wikipedia page of the ck-12 keywords to build index. The ck-12 keywords are scraped from the https://www.ck12.org using the scrape.py script. Then we get the wikipedia page content of these keywords using get_wikipedia_data.py.

Indexing and Searching

We use the lucene to index and search. The index is built using IndexFiles.java(java org.apache.lucene.demo.IndexFiles -docs data/wikipedia_content_based_on_ck_12_keyword_one_file_per_keyword/), and search is done by Get_Top_Documents_Based_on_Lucene.java, which is just a modification of the SearchFiles.java.

Prediction

For each answer, we calculate the word count of its words(excluding stop words) on the top N search results. Then the answer which get the highest word count is chosen as the prediction.

Performance

We can get a score of 0.36, please refer to https://www.kaggle.com/c/the-allen-ai-science-challenge/leaderboard for the current leaderboard.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
Get_Top_Documents_Based_on_Lucene.java		Get_Top_Documents_Based_on_Lucene.java
HelloWorld.java		HelloWorld.java
IndexFiles.java		IndexFiles.java
README.md		README.md
SearchFiles.java		SearchFiles.java
cal_metrics.py		cal_metrics.py
get_wikipedia_data.py		get_wikipedia_data.py
index_and_search_whoosh.py		index_and_search_whoosh.py
predict_based_on_lucene_search_result.py		predict_based_on_lucene_search_result.py
read_epub.py		read_epub.py
scrape.py		scrape.py
search_google.py		search_google.py
search_wikipedia.py		search_wikipedia.py
statis.py		statis.py
test_whoosh.py		test_whoosh.py
util.py		util.py
word2vec.py		word2vec.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Allen_AI_Science_Challenge_JunweiPan

1. Information Retrieval Based Method

Data Preparation

Indexing and Searching

Prediction

Performance

About

Releases

Packages

Languages

junwei-pan/Allen_AI_Science_Challenge_JunweiPan

Folders and files

Latest commit

History

Repository files navigation

Allen_AI_Science_Challenge_JunweiPan

1. Information Retrieval Based Method

Data Preparation

Indexing and Searching

Prediction

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages