├── README.md
├── email
│ └── user_email.list
├── flask
├── papers
│ └── 2016-01-05
│ └── cs.cv
└── spider
- email folder include the scripts of send emails to users
- flask folder include the scripts of our web interface
- papers folder include the paper we get from arxiv.com, named by data-time, and the subfolder in the folder of date-time is the research area such as cs.cv
- spider include the scripts to scrawl the papers from arxiv.
https://arxiv.org/list/cs.CV/pastweek?skip=0&show=1000
user_id | user_nickname | user_email | subject |
---|---|---|---|
1 | hello | [email protected] | cs_cv |
2 | hello | [email protected] | cs_kl |
3 | hello2 | [email protected] | cs_cv |
- extract the information of the pdf
- add the support of multi thread to download pdfs
- add the config of the url including research area
- add the module to write the all paper info to a file in the pdf folder 'summary.csv'
- [] add the support of filter the download failed files in the summary.csv
- add the email to format the area email to the users
- add the flask module including add the user email
- add the module that python read the pdf files, detailed in Python读取PDF内容
- [] replace the write file to sqite data
- [] replace write_file with write_sqite_file
- [] replace the run() in deploy_email and deploy_download_pdfs.py to be sqite version
- [] add the module of the paper recommendation
deploy_download_pdfs.py
: Scrapy the pdfs each week according the user_info.csvdeploy_email.py
: Send the emails