Skip to content

SarthakRajJindal/Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler

This is a python based web crawler made for IEEE papers. It uses https://ieeexplore.ieee.org/document/8301529 as the start point and recursively traverses through the reference papers and extracts abstract data from them until it obtains a prespecified number of papers.

Requirements

The program uses some python dependencies like bs4 and requests, install them by using.

pip install -r requirements.txt

Running Instructions

  1. Run the main.py file using python python main.py
  2. Enter the number of papers that are needed to be scraped.
  3. Wait for the scaping process, it can take a couple of minutes.

Program will print the traversed document links of the papers.
Also you can find the details of these papers in a csv file "papers.csv".
It will have the information regarding the document link, title and abstract of the paper.

Sample run is shown below :

174bfe57-b6c9-40a7-8ea2-a5f0ff2b25c2

Here is a sample screenshot of the generated CSV file.

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages