API for searching movies by its synopsis. The movie data was gathered from Metacritic, using a web crawler. A reverse index was created based on the terms of the movies synopsis.
The Jaccard Index was used to compute a score value assigned to each document according to specific queries. For each query, a ranking of the 10 most relevant movies is returned, taking into account the terms in the synopsis of each one.
docker pull ghcr.io/vncsmyrnk/movie-search:latest
docker run --rm \
-p 5000:5000 \
movie-search:latest
docker run --rm -it \
-v "$(pwd)"/src:/var/app \
-p 5000:5000 \
--workdir /var/app \
--cpus 2 \
--name movie-search \
python:3.9-slim bash
# Inside container
pip install -r requirements.txt
flask --app server run --host 0.0.0.0
curl -X GET http://localhost:5000/api/query?q=music%20play -s | jq .
Returns:
[
{
"movie": {
"avg_score": 70,
"description": "Six urbanites play musical beds.",
"description_cleaned": "six urbanit play music bed",
"movie_uri": "/movie/your-friends-neighbors/",
"platform": "metacritic",
"scores": [
{
"reviewer_name": "The A.V. Club",
"score": "100"
},
{
"reviewer_name": "Newsweek",
"score": "90"
},
{
"reviewer_name": "TV Guide Magazine",
"score": "80"
},
{
"reviewer_name": "San Francisco Examiner",
"score": "75"
},
{
"reviewer_name": "The New Republic",
"score": "70"
},
{
"reviewer_name": "Los Angeles Times",
"score": "50"
},
{
"reviewer_name": "San Francisco Chronicle",
"score": "25"
}
],
"title": "Your Friends & Neighbors",
"year": "1998"
},
"movie_id": 3304,
"score_jaccard": 0.4
},
{...}
]