Skip to content

Data exploration and correlation analysis of the popular Palmer Penguins dataset. - Python, pandas, matplotlib, numpy.

Notifications You must be signed in to change notification settings

CianGallagher/principles-of-data-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Penguins Data Analysis

Overview

This repository contains data exploration and correlation analysis of the popular Palmer Penguins dataset. The dataset is comprised of measurements of penguins collected from three species: Adélie, Chinstrap, and Gentoo.

The dataset includes the following variables:

species: The species of penguin (Adélie, Chinstrap, or Gentoo).
island: The island where the penguin was observed (Biscoe, Dream, or Torgersen).
bill_length_mm: The length of the penguin's bill in millimeters.
bill_depth_mm: The depth of the penguin's bill in millimeters.
flipper_length_mm: The length of the penguin's flipper in millimeters.
body_mass_g: The body mass of the penguin in grams.
sex: The sex of the penguin (male, female, or NaN).

Analysis

The analysis includes the following steps:

Data Cleaning: Ensuring no missing values and ensuring data consistency.
Exploratory Data Analysis: Visualizing the distribution of various features and exploring relationships between 2 variables.
Summary Statistics: Calculating basic statistical summaries to gain insights into the dataset.
Correlation Analysis: Illustrating the correlation (r) between two data set variables.

Dependencies

Python
Jupyter Notebook
pandas
matplotlib
numpy

Research Materials

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv

https://www.kaggle.com/datasets/parulpandey/palmer-archipelago-antarctica-penguin-data/data

https://stackoverflow.com/questions/33506372/using-int-with-decimal-numbers

https://www.wikiwand.com/en/Statistical_data_type

https://www.w3schools.com/python/pandas/pandas_plotting.asp

https://www.w3schools.com/python/matplotlib_pyplot.asp

https://www.wikiwand.com/en/Pearson_correlation_coefficient

https://www.geeksforgeeks.org/python-pandas-dataframe-corr/

https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/pearsons-correlation-coefficient/

https://python-graph-gallery.com/scatterplot-with-regression-fit-in-matplotlib/

https://data36.com/linear-regression-in-python-numpy-polyfit/

About

Data exploration and correlation analysis of the popular Palmer Penguins dataset. - Python, pandas, matplotlib, numpy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published