This is a python based web crawler made for IEEE papers. It uses https://ieeexplore.ieee.org/document/8301529 as the start point and recursively traverses through the reference papers and extracts abstract data from them until it obtains a prespecified number of papers.
The program uses some python dependencies like bs4 and requests, install them by using.
pip install -r requirements.txt
- Run the main.py file using python
python main.py
- Enter the number of papers that are needed to be scraped.
- Wait for the scaping process, it can take a couple of minutes.
Program will print the traversed document links of the papers.
Also you can find the details of these papers in a csv file "papers.csv".
It will have the information regarding the document link, title and abstract of the paper.
Sample run is shown below :
Here is a sample screenshot of the generated CSV file.