Skip to content

Analyzing Red Wine Quality using R with libraries using mostly knitr and dplyr

Notifications You must be signed in to change notification settings

jtsou/Red-Wine-Analysis-with-R

Repository files navigation

Red Wine Analysis with R

Chemical Properties

  • fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily) (tartaric acid - g / dm^3)
  • volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste (acetic acid - g / dm^3)
  • citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines (g / dm^3)
  • residual sugar: the amount of sugar remaining after fermentation stops (g / dm^3)
  • chlorides: the amount of salt in the wine (sodium chloride - g / dm^3
  • free sulfur dioxide: he free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion (mg / dm^3)
  • total sulfur dioxide: amount of free and bound forms of S02 (mg / dm^3)
  • density: the density of water is close to that of water depending on the percent alcohol and sugar content (g / cm^3)
  • pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic)
  • sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels (potassium sulphate - g / dm3)
  • alcohol: the percent alcohol content of the wine (% by volume)
Output variable (based on sensory data):
  • quality (score between 0 and 10)

Data Exploration

The report explores a dataset containing wine quality and attributes for approximately 1599 red wines.

All the chemical properties of the wine are explored for the first round. After that, I chose some interesting results for further investigation. I created ggpairs for correlation in both numerical form and graphs.

I then created three unique plots that illustrate how alcohol percentage is essential to the quality of red wine. They are shown as below.

Final Plots and Summary

Plot One


Preview

Description One

I created a new variable 'label' to show how alcohol percentage and wine quality vary. We can see that medium alcohol is the majority.

Plot Two


Preview

Description Two

From the scatter plot distribution, there is a negative relationship between density and pH. The lines are grown to see the ratings. The correlation we identified earlier suggested a relatively moderate relationship which is -0.342. The graph here also suggests that the lower the pH with density be between approximately 0.99 and 1, the rating of the alcohol would remain excellent.

Plot Three


Preview

Description Three

The distribution of Alcohol Percentage and Wine Density is strong. The higher the alcohol percenrage, the lower is the density. We can also see in this plot that stronger wine tend to have higher rating.

REFLECTION

Based on the analysis I did for the dataset, I am convinced that alcohol concentrate is the most important factor to deciding the quality of Red Wine would be density. The lower the density, the higher the alcohol concentration, and the higher the alcohol concentration the better the quality of wine. One of the challenges I encountered however, is that although I like wine, I do not know the chemistry behind it. It was a little tough for me to wrap my mind around what might be the most important component in making a quality wine. After playing around with the variables and creating the plots, the results eventually made sense to me. I wish I could enhence my analysis by knowing which brands that are highly rated by consumers fit my prediction. This analysis serves as a rough idea of what makes a good wine and the audience can just go from there.

About

Analyzing Red Wine Quality using R with libraries using mostly knitr and dplyr

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published