talking_trump

This repo contains everything you need to scrape reddit comments and find the most distinctive set of words that one community uses relative to another.

However, you will have to collect your own sets of usernames from the community of interest and reddit more broadly.

By convention, I call the community of interest TD and the rest of reddit nonTD, since I use this to analyze r/the_donald. But this code can be adapted for any subreddit, or group of subreddits, depending on how you choose your samples.

Using this repo

Start by running the functions in code/scraping_functions.R. Then use get_comment_history.R to scrape comment history from the users you specify. This file calls the functions defined in scraping_functions. Note that get_comment_history.R will save a .csv file for each user (files tend to be around 50-100 kB). You may want to start small.

analyze_comments.R takes the set of files you created for each user and turns them into two dataframes (one for each subsample). The dataframes contain every word ever used by any user, the number of instances of each word, and the number of unique users who used each word.

make_chatterplot.R combines these two dataframes, calculates a usage ratio for each word, and produces a chatterplot of the most distinctive words found in one community.

Usage score is calculated as (word uses + 1)/(total uses of any word) * (unique users of the word + 1)/(total users in subsample). This measure weights usage more heavily if a word is used by many users, rather than by a few users many times. Then ratio of usage scores is taken to find the most distinctive words in one subsample (TD) relative to the other (nonTD).

Sources and code I adapted

Scraping reddit: dmarx's github, https://gist.github.com/dmarx/8140428

Text analysis: Text mining with R by Julia Silge and David Robinson, https://www.tidytextmining.com/tidytext.html

Chatterplots: Daniel McNichols' Toward Data Science blog, https://towardsdatascience.com/rip-wordclouds-long-live-chatterplots-e76a76896098

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
results		results
top200s		top200s
user_tables		user_tables
.gitignore		.gitignore
README.html		README.html
README.md		README.md
talking_trump.Rproj		talking_trump.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

talking_trump

Using this repo

Sources and code I adapted

About

Releases

Packages

Languages

klardner/talking_trump

Folders and files

Latest commit

History

Repository files navigation

talking_trump

Using this repo

Sources and code I adapted

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages