gpt-daily-arxiv is a program that fetches papers using arxiv rss feeds and utilizes gpt to summarize the papers.
+---------------------+
| Arxiv RSS Feed |
+---------------------+
|
v
+---------------------+
| Retrieve PDF |
| from RSS Feed |
+---------------------+
|
v
+---------------------+
| Convert PDF to Text |
+---------------------+
|
v
+---------------------+
| Ask GPT to Summarize|
| the Paper |
+---------------------+
|
v
+---------------------+
| Write Data Record |
| in MongoDB |
+---------------------+
Ensure `OPENAPI_API_KEY` is setted. And if you are behind a proxy set environment variable `OPENAI_PROXY_URL` to your proxy server. Checkout this link:
modify main.py below codes
arxiv_url_dict = {
"Computer Vision": "https://arxiv.org/rss/cs.CV",
"Computer Sicence": "https://arxiv.org/rss/cs",
"Artificial Intelligence": "https://arxiv.org/rss/cs.AI",
"Robotics": "https://arxiv.org/rss/cs.RO",
"Software Engineering": "https://rss.arxiv.org/rss/cs.SE",
}
Papers are download under ‘db’ folder
pip install -r requirements.txt
python3 main.py
# Also make sure mongodb is installed
Since paper notes are stored in mongodb. I recommend using mongo-gui for visualization.
- [ ] dockerize this project
- [ ] build frontend
- [ ] support customize LLM