Data Science Learning Experience with Pandas

Day 1

I gained experience in handling missing data (NaN values), importing data from CSV files, removing unnecessary data, selecting specific columns, identifying maximum values within datasets, and locating corresponding row indices.

Day 2

I have acquired proficiency in utilizing Pandas and Matplotlib to transform raw data into informative visualizations. I employed pandas and matplotlib to generate a visual representation of the most widely used programming languages from 2008 to 2024.

Day 3

I learnt how to:

use HTML markdowns in Notebooks
combine the groupby() and count() functions to aggregate data
use the value_counts() function
slice DataFrames using the square bracket notation
use the agg() function to run an operation on a particular column
rename() columns of DataFrames
create a linear chart with two seperate axes to visualize data that have different scales
create a scatter plot and Bar chart in Matplotlib
work with tables in a relational database by using primary and foreign keys
merge() DataFrames along a particular column.

Day 4

I learnt how to:

use .describe() to get a snapshot of your data like average, highest and lowest values
use .resample() to make a time-series data comparable to another by changing the periodicity.
work with matplotlib.dates Locators to better style a timeline (e.g., an axis on a chart).
find the number of NaN values with .isna().values.sum()
change the resolution of a chart using the figure's dpi
create dashed '--' and dotted '-.' lines using linestyles
use different kinds of markers (e.g., 'o' or '^') on charts.
fine-tune the styling of Matplotlib charts by using limits, labels, linewidth and colours
use .grid() to help visually identify seasonality in a time series.

Day 5

I learnt how to:

pull a random sample from a DataFrame using .sample()
find duplicated entries with .duplicated() and .drop_duplicates()
convert string and object data types into numbers with .to_numeric()
use plotly to generate pie, donut and bar charts as well as box and scatter plots

Day 6

I learnt how to:

create arrays with np.array()
generate arrays using .arange(), .random() and .linspace()
analyse the shape and dimensions of ndarray
slice and subset a ndarray based on its indices
do linear algebra like operations with scalars and matrix multiplication
use NumPy's broadcasting to make ndarrays shapes compatible
manipulate images in the form ndarrays

Day 7

I learnt how to:

use nested loops to remove unwanted characters from multiple columns
create bubble charts using Seaborn library
filter Pandas DataFrame based on multiple conditions using both .loc[] and .query()
style Seaborn charts using the pre-built styles and by modifying Matplotlib parameters
use floor division to convert years to decades
use Seaborn to superimpose a linear regression over our data
run regressions with scikit-learn and calculate the coefficients

Day 8

I learnt how to:

create a Choropleth to display data on a map
create bar charts showing different segments of the data with plotly
create Sunburst charts with plotly.
use Seaborn's .lmplot() and show best-fit lines across multiple categories using the row, hue, and lowess parameters

Day 9

I learnt how to:

use histograms to visualise distributions
superimpose histograms on top of each other even when the data series have different lengths
use a to smooth out kinks in a histogram and visualise a distribution with a Kernel Density Estimate (KDE)
improve a KDE by specifying boundaries on the estimates
use scipy and test for statistical significance by looking at p-values
highlight different parts of a time series chart in Matplotib
add and configure a Legend in Matplotlib
NumPy's .where() function to process elements depending on a condition

Day 10

I learnt how to:

quickly spot relationships in a dataset using Seaborn's .pairplot()
split the data into a training and testing dataset to better evaluate a model's performance
run a multivariable regression
evaluate that regression-based on the sign of its coefficients
analyse and look for patterns in a model's residuals
improve a regression model using (a log) data transformation
specify your own values for various features and use your model to make a prediction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Learning Experience with Pandas

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
003 Lego Data		003 Lego Data
004 Google Trends and Data Viz Data		004 Google Trends and Data Viz Data
001 Data_Exploration_College_Major.ipynb		001 Data_Exploration_College_Major.ipynb
001 salaries-by-college-major.csv		001 salaries-by-college-major.csv
002 Most Used Progrsmming Languages.csv		002 Most Used Progrsmming Languages.csv
002 Most used Programming Languages.ipynb		002 Most used Programming Languages.ipynb
003 Lego Analysis.ipynb		003 Lego Analysis.ipynb
004 Google Trends and Data Visualisation.ipynb		004 Google Trends and Data Visualisation.ipynb
005 Google Play Store App Analytics.ipynb		005 Google Play Store App Analytics.ipynb
005 Google Play Store App Data.csv		005 Google Play Store App Data.csv
006 Computation with NumPy and N-Dimensional Arrays.ipynb		006 Computation with NumPy and N-Dimensional Arrays.ipynb
006 raylum.jpg		006 raylum.jpg
007 Seaborn and Linear Regression.ipynb		007 Seaborn and Linear Regression.ipynb
007 cost_revenue_dirty.csv		007 cost_revenue_dirty.csv
008 nobel_prize_data.csv		008 nobel_prize_data.csv
008_Nobel_Prize_Analysis.ipynb		008_Nobel_Prize_Analysis.ipynb
009 annual_deaths_by_clinic.csv		009 annual_deaths_by_clinic.csv
009 monthly_deaths.csv		009 monthly_deaths.csv
009_Dr_Semmelweis_Handwashing_Discovery.ipynb		009_Dr_Semmelweis_Handwashing_Discovery.ipynb
010 boston.csv		010 boston.csv
010_Multivariable_Regression_and_Valuation_Model.ipynb		010_Multivariable_Regression_and_Valuation_Model.ipynb
README.md		README.md

Mz-scripter/Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Data Science Learning Experience with Pandas

Day 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages