Skip to content

GarimaB06/Webcrawl

Repository files navigation

Webcrawl

Webcrawl is a web crawling tool built with Typescript, Next.js, React Hooks, Node.js, Express, D3, Puppeteer, and SASS. It allows you to input a URL of the website you'd like to crawl and visualize the crawled links in both JSON and a D3 tree. Screenshot 2024-01-10 at 1 21 45 AM

Features

  • Web Crawling: Enter the URL of the website you want to crawl, and Webcrawl will fetch and organize the links.

  • Visualization: Switch between viewing the crawled links in a structured JSON format or explore an interactive D3 tree, offering a graphical representation of the website structure.

Technologies Used

  • Frontend:

    • React with Next.js
    • D3 for data visualization
    • Typescript for enhanced code maintainability
  • Backend:

    • Node.js with Express
    • Puppeteer for web scraping
  • Styling:

    • SASS for styling the user interface

How to Use

  1. Installation:

    • Clone the repository: https://github.com/GarimaB06/Webcrawl.git
    • Install dependencies: npm install
  2. Run the Application:

    • Start the backend server: npm run server-start (Navigate to http://localhost:3001).
    • Start the frontend application: npm run dev (Navigate to http://localhost:3000).
  3. Access the Application:

    • Open your browser and navigate to http://localhost:3000.
    • Input the URL of the website you want to crawl.
  4. Visualize Results:

    • Explore the crawled links in the provided JSON format or the D3 tree visualization.

Dependencies

  • Frontend:

    • React
    • Next.js
    • D3
    • Typescript
  • Backend:

    • Node.js
    • Express
    • Puppeteer

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.

License

This project is licensed under the MIT License.

Authors

Garima Bhatia