This README.md file was generated on 03-04-23 by Madeleine Roberts
Examining the Scope and Sentiment of Local Newspaper Coverage on the 2023 Primary Election's Mayoral Candidates in Chicago
The project aims to analyze the press coverage of Chicago's mayoral primary race and investigate how candidates are covered differently in local newspapers. We scraped/accessed the APIs of six different Chicago focused media sites to collect data. We then conducted data analysis to identify which topics are brought up most often and sentiment analysis to examine the tone of the articles. This analysis was performed for the candidate overall, the paper overall and for the candidate in each paper.
Note can only be run with
-
Clone the Project Repository via SSH
[email protected]:uchicago-capp122-spring23/databased_project.git
- Install Virtual Environment and Dependencies
poetry shell
poetry install
Project must be run in the Poetry virtual environment. Upon completion of above installation requirements and within project terminal, and on each subsequent rendering of project, initialize virtual environment by running:
poetry shell
Execute the project by running:
python -m databased
This command may take a minute to load project to terminal.
You are then prompted to enter a singular digit command to execute a portion or the entire project, as seen below.
To execute a desired aspect of the project please enter one of the following commands:
1 - Open Data Visualization
2 - Scrape All Newspapers
3 - Clean Scraped Data
4 - Conduct Data Analysis
5 - Run Entire Project Start to Finish (Scrape -> Clean -> Analyze -> Visualize)
6 - End Program
Please input the number of your desired command:
example: "1[Return]" will run the data visualization.
Command 1 - Opens Data Visualization
Renders a Dash to visualize the final results of the dataBASED project.
Notes:
This command will take about 1 minute to render Dash.
Dash will throw a warning "This is a development server," this error is fine.
Command 2 - Executes All Scrapers/Proquest API
Runs all scrapers and Proquest API to collect newspaper articles about Chicago's mayoral candidates. The retrieved data is then stored in JSON format and outputted to the databased/data folder.
Note: This command will take about 20 minutes to complete.
Command 3 - Executes All Data Cleaning
Runs data cleaning on all scraped data; strips stop words, normalizes case, and selects only sentences that refer to the candidate that is the subject of the article. The cleaned data is then stored in JSON format and outputted to the databased/data folder.
Note: This command will take about 1 minute to complete.
Command 4 - Execute All Data Analysis
Runs data analysis on cleaned candidate data to calculate word frequency, sentiment, and article counts for the candidate, the newspaper, and for the candidate within each paper. The results are outputted to JSON files within databased/analysis/data folder.
Note: This command will take about 12 minutes to complete. However, if you comment out lines 54 and 55 in basic_sentiment.py the command will execute in about 1 minute. The completion of the JSON for overall newspaper sentiment will be prevented as a result of this.
Command 5 - Execute Entire Project
Runs entire project start to finish. Runs scrapers/Proquest API, then cleans article data, conducts data analysis, and renders the visualization of results.
Note: this command will take about 45 minutes to complete.
Command 6 - Close Project
Terminates python scripts.
If you encounter issues with nltk or pyarrow please run the following commands within the poetry shell:
python3 -m pip install nltk
python3 pip install pyarrow
Upon extensive testing, sometimes a scraper will become blocked by servers. If this occurs, run program again and they should run completely.
CAPP 122 Instructor - Professor James Turk
CAPP 122 Project TA - Yifu Hou
Local Chicago New Sources Used for Data Collection: