This is a simple web crawler demonstration project developed using the Laravel framework. It can be used to fetch the content of a specified URL and save metadata about the web page, such as title, description, and body.
- Input a URL to crawl
- Crawl a screenshot of the web page
- Extract web page title, description, and body
- View a history of crawled records
- Search and filter by title, description, or creation date
- PHP >= 7.4
- Composer
- MariaSQL Database
- Node.js and NPM (for the frontend part)
-
Clone the project repository to your local machine:
git clone https://github.com/JerryR7/Crawler-Laravel.git
-
Navigate to the project directory:
cd Crawler
-
Install PHP dependencies:
composer install
-
Copy the
.env.example
file and rename it to.env
, then configure the database connection and other environment variables:cp .env.example .env
Modify the following section in the
.env
file to configure the database connection:DB_CONNECTION=mysql DB_HOST=127.0.0.1 DB_PORT=3306 DB_DATABASE=your_database_name DB_USERNAME=your_database_username DB_PASSWORD=your_database_password
-
Generate the application key:
php artisan key:generate
-
Run database migrations:
php artisan migrate
-
Install frontend dependencies (if not already installed):
npm install
-
Create the symbolic link:
php artisan storage:link
-
Start the local development server:
php artisan serve
-
Access
http://localhost:8000/crawler
to view the project.
- Open the application, and you will see an input box where you can enter the URL to crawl.
- Enter a URL and click the "Crawl" button to initiate crawling and save web page information.
- You can view previous crawl records on the "Crawled Records" page.
- On the "Crawled Records" page, you can also use the search and filter functionality to find records with specific titles, descriptions, or creation dates.
If you'd like to contribute to this project, please feel free to submit issues, suggestions, or pull requests. Please follow our contribution guidelines.
This project is licensed under the MIT License. For details, please refer to the LICENSE file.