Team Member:
- Meihao Chen
- Yitong Wang
Advisor:
- Pablo Barbera
- All tweets mentioning "hillary", "hillary clinton" or "clinton" between April 12, 2015 at 17:00 UTC and April 14, 2015 at 17:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK
- General description of the dataset: number of tweets in total; number of tweets in a time series; important word count; number of retweets
- All tweets mentioning "oscars", "oscar", "red carpet", "oscars2014", "academy", "award", "awards" between March 2nd, 23:00 UTC and March 3rd 06:00 UTC. Tweets are stored in JSON format by hour (each file is a different hour of data) and gzipped, inside a tar file. LINK
- General decription of the dataset: count hashtags, number of tweets in a time series
- Name entity recognition research. LINK
- Public opinion analysis and prediction of award. LINK
- We intent to use D3, javascript, and other tools to build interactive visualization on website.
- Pushed code for reading json file and running preliminary analysis on the hillary dataset
- Preliminary: basic counts of fields (used for the exploratory data presentation)
- dataForVis: processed data for d3 visualization
- OscarNameCount: data derived from name entity tagger on the tweet texts, which gives the number of occurrences of names
- filteredData: Fields extracted from Oscar-related tweets
- Rest: Counts of each field data file
- All the references file and project descriptions
- Scripts for running lmr (local map reduce), jq, counting data, and generating data for d3 (hier_bund.sh)
- MapReduce scripts for processing the raw field data extracted using jq
- Scripts for generating the name entity from the tweets
- Scripts for processing data into the format that can be used for d3 visualization
- Normally takes the data processed by jq
- Contains all the files needed for constructing the webpage