Skip to content

SearchEngine for scraped pages from walmart.com

Notifications You must be signed in to change notification settings

themorlock/SearchMart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SearchMart

SearchEngine for scraped pages from walmart.com

Contains the webscraper used to scrape data from walmart.com and the scraped dataset. Each '_products.txt' file contains unique products in that category and the '_links.txt' file holds the links between different products. The 'all_products.txt' and 'all_links.txt' files contain all the unique products and all their unique links. In total we had 21,532 unique products.

To run the web scraper, you need to provide three command-line arguments: the url where it should start scraping, the max number of products it should crawl to, and the name of the department you are scraping.

All required modules must be in their standard versions except you must install scikit-learn version 0.22.2.post1 (pip install -Iv scikit-learn==0.22.2.post1)

To use the server for the backend, run an Apache server with index.html in ServerPython/, and also run the apy.py REST API in RestAPI/.

Link to our machine learning models are here. To load them into our server, make sure they are in a folder titled 'MachineLearningModels' that is one directory above the directory the server is in.

About

SearchEngine for scraped pages from walmart.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published