A personalization-based event recommender using TicketMaster API & user behavior analysis of our system using ELK on Amazon EC2
- Why: Many events most people like but you may not like, or most people don't know but you may like, so we need a personalization based recommendation system for event search.
- Use Case:
- Search nearby event
- Set event as favorite
- Get recommended event
- 3 - tier architecture
- Recommendation algorithm
- Using content-based recommendation as cold start. Find categories of user's favorite events, and recommend similar events from TicketMaster API with similar categories.
- Benchmark
- Handle about 150 QPS(query per second) tested by Apache JMeter on EC2.
- Search
- favorite
- get favorite
- set favorite
- unset favorite
- recommendation
-
MySQL
-
MongoDB
- collections( MongoDB collection == MySQL tables)
- users - store user information and favorite history.
- items - store information and item-category relationship.
- logs - used in user behavior analysis to find peak time QPS of the system.
- CRUD operations, please see doc
- collections( MongoDB collection == MySQL tables)
- TicketMaster API
- doc - DISCOVERY API
- example: Get nearby 50 miles music related events
https://app.ticketmaster.com/discovery/v2/events.json?apikey=12345&geoPoint=abcd&keyword=music&radius=50
- Geohash Encoding and Decoding Algorithm
- since TicketMaster asked to use Geohash instead of latitude and longtitude directly in request, I used algorithm here to convert encode/decode (latitude,longtitude) pair to Geohash.
- CORS issue
- to fix CORS issue, please take look at doc here. Need figure out it's a simple request or a Preflighted requests, and need add something like
on server side to respond to client.response.setContentType("application/json"); response.addHeader("Access-Control-Allow-Origin", "*");
-
use ElasticSearch stores all traffic logs of the system.
-
use Logstash(data processing pipeline) to realtime monitor request and log changes, filter results and save to ElasticSearch.
- please check logstash pipeline file logstash_pipeline.conf.
-
use Kibana to visualize ElasticSearch data.
-
Offline log analysis to find peak time using MongoDB MapReduce
- One GET favorite request example in log to be analyzed.
73.223.210.212 - - [19/Aug/2017:22:00:24 +0000] "GET /EventRecommender/history?user_id=1111 HTTP/1.1" 200 11410
- See Purify.java to parse tomcat_logs and save to mongoDB, FindPeak.java to do MapReduce jobs. See pseudo code of MapReduce below:
function map(String url, String time): If request url starts with /EventRecommender: emit (time, 1) function reduce(Iterable<Integer> values): return Array.sum(values)
See a sample tested result: