Predicting Citation Counts with a Neural Network

This is the code used for this paper. Reproducing the results from this paper can be done as follows:

Download files. We plan to make these available on a webserver in the future. For now, you can ask us for these files and save the following ones in data/arxiv/keywords-backend/

papers
paper_topics
all_lengths.json
broadness_lda

and these in data/arxiv/thomsonreuters/

JournalHomeGrid-2001.csv
...
JournalHomeGrid-2009.csv

Set up a MySQL database and save the connection data in settings_private.py.

DB_PASS = '...'
DB_USER = '...'
DB_HOST = '...'
DB_NAME = '...'

Set up the database and import the arXiv/Paperscape/JIF data.

mysql < database_structure.sql
python arxiv_importer.py
python paperscape_importer.py
python jif_importer.py

Some pre-processing needs to be done.

python analysis.py
python net.py

Run the following SQL command.

UPDATE analysissingle512_authors SET train_real = train

Generate the cross-validation groups and prepare the x and y data.

python run_local.py prepare

Train the neural network and random forest models for each cross-validation round $i (0 to 19).

python run_cluster.py train-rf $i
python run_cluster.py train-net $i

Evaluate the trained models as well as some naive baseline models for each $i and summarize the results.

python run_local.py evaluate-rf --i $i
python run_local.py evaluate-net --i $i
python run_local.py evaluate-linear-naive --i $i
python run_local.py summarize

The summary files will be placed in data/analysissingle512/evaluate/no-max-hindex, the results for each individual trained model will be placed in data/analysissingle512/evaluate/no-max-hindex/task-results.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
analysis.py		analysis.py
arxiv_importer.py		arxiv_importer.py
cross_validation.py		cross_validation.py
database-structure.sql		database-structure.sql
db.py		db.py
jif_importer.py		jif_importer.py
net.py		net.py
paperscape_importer.py		paperscape_importer.py
random_forests.py		random_forests.py
requirements.txt		requirements.txt
run_cluster.py		run_cluster.py
run_local.py		run_local.py
runner.py		runner.py
settings.py		settings.py
subsets_evaluate.py		subsets_evaluate.py
tasks.py		tasks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Citation Counts with a Neural Network

About

Releases

Packages

Languages

tmistele/predicting-citation-counts-net

Folders and files

Latest commit

History

Repository files navigation

Predicting Citation Counts with a Neural Network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages