This repository contains the code for arxiv-sanity-bot
, a system that:
- takes the most recent AI/ML papers from arxiv
- ranks them by Altmetric score
- selects papers above a threshold
- sends them to GPT-4o for summarization using the OpenAI API
- Extract the first image of the paper
- posts the results to X/Twitter using tweepy
Clicking on the shortened URL in the tweet takes you to the arxiv page.
The code runs periodically as a Github action (so it runs on free compute here on Github). It fetches the last papers from arxiv, parses all the papers contained there extracting title, abstract and arxiv number, then sends the arxiv numbers to Altmetrics to collect the Altmetric score. After putting everything together in a pandas DataFrame, it sorts it by score, then sends the first results to GPT-4o for summarization using the OpenAI API. It then extracts the first image of the paper (if it exists). Each result is concatenated with a shortened version of its Arxiv link and then posted on X/Twitter, with the first image of the paper attached.
- The icon for the bot was generated using Stable Diffusion
- In order to accumulate enough signal for the Altmetric score, the bot considers papers within a window going back a few days
- The bot avoids reposting the same paper multiple times by maintaining track of the posted tweets, exploiting a Firebase database (free quota).
- All parameters governing the functioning of the bot are contained in the config.py module.