source: https://twitter.com/donttearmedown_/status/1274252383636381696/photo/1
Compare and analyze the rock bands, Evanescence and Within Temptation, using some of the most important skills of a Data Scientist.
Notebook 1
In this notebook I show how
to built a Python web scraper using BeautifulSoup, applying some simple Python string methods as well as other tools to obtain the lyrics from both bands.
Notebook 2
In this notebook
data about both bands, their albums, and tracks (metadata and audio features) are retrieved using Spotify's API. I'll be using Spotipy
which is a lightweight Python library for the Spotify Web API. In this notebook we
present already some EDA comparing both bands.
To have access to Spotify API is necessary to request you credentials at https://developer.spotify.com/dashboard/login. Use these credentials in credentials.py
.
Notebook 3
Some prediction models are used here on audio features of tracks to predict if a track is from Evanescence
or Within Temptation
.
Models used are:
- Knn (k-nearest neighbors)
- Decision Tree
- Random Forest
- Adaboost
Also AutoML techniques are used to identify ML models in a authomatic way. Two open source libraries are used:
- TPOT, an open source automated machine learning library developed at the University of Pennsylvania
- H20.ai AutoML, a second open source automated machine learning library developed by researchers at H20.ai
Noteboob 4
. Here data retrieved through
web scraping and using Spotify's API will be used to analyse further both bands. I'll be making use of some NLP and visualizantion.
- Install requirements using
pip install -r requirements.txt
.- Make sure you use Python 3.
- You may want to use a virtual environment for this.
Script webscraping_lyrics.py contains the code explained in
Notebook 1
to retrieve
lyrics of Evanescence or Within Temptation from songteksten.net.
Got to the folder where webscraping_lyrics.py
is located and type the following command depending on which lyrics you need to retrieve:
- to retrieve lyrics from evanescence:
python webscraping_lyrics.py -e
- to retrieve lyrics from evanescence:
python webscraping_lyrics.py -w
A subdirectory data\
will be created to save the resulting files.