This repo contains all the backend code for the Movie Pepper open source recommendation engine.
This includes the REST API and the IMDb crawler.
Python 3, pip and virtualenv must be installed
Create a virtualenv
python3 -m venv venv
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
python -m textblob.download_corpora
python -m nltk.downloader stopwords
A Bash script is provided to simplify executing the Spidy crawler.
cd movie_scrape
START_URL="http://www.imdb.com/search/title?groups=top_1000&sort=user_rating,desc&page=1&ref" ./scrap.sh
After the crawl is complete calculate the TF-IDF values and Doc2Vec models.
python tfidf_lsa.py
python doc2vec.py
This step is needed to execute the server.
Start the server
gunicorn --bind 0.0.0.0:5000 server:app
You will probably want to use a reverse proxy such as NGINX and secure it with HTTPS.
For developemnt you can use
python server.py