Skip to content

Mz-scripter/Data-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Learning Experience with Pandas

Day 1

I gained experience in handling missing data (NaN values), importing data from CSV files, removing unnecessary data, selecting specific columns, identifying maximum values within datasets, and locating corresponding row indices.

Day 2

I have acquired proficiency in utilizing Pandas and Matplotlib to transform raw data into informative visualizations. I employed pandas and matplotlib to generate a visual representation of the most widely used programming languages from 2008 to 2024.

Day 3

I learnt how to:

  • use HTML markdowns in Notebooks
  • combine the groupby() and count() functions to aggregate data
  • use the value_counts() function
  • slice DataFrames using the square bracket notation
  • use the agg() function to run an operation on a particular column
  • rename() columns of DataFrames
  • create a linear chart with two seperate axes to visualize data that have different scales
  • create a scatter plot and Bar chart in Matplotlib
  • work with tables in a relational database by using primary and foreign keys
  • merge() DataFrames along a particular column.

Day 4

I learnt how to:

  • use .describe() to get a snapshot of your data like average, highest and lowest values
  • use .resample() to make a time-series data comparable to another by changing the periodicity.
  • work with matplotlib.dates Locators to better style a timeline (e.g., an axis on a chart).
  • find the number of NaN values with .isna().values.sum()
  • change the resolution of a chart using the figure's dpi
  • create dashed '--' and dotted '-.' lines using linestyles
  • use different kinds of markers (e.g., 'o' or '^') on charts.
  • fine-tune the styling of Matplotlib charts by using limits, labels, linewidth and colours
  • use .grid() to help visually identify seasonality in a time series.

Day 5

I learnt how to:

  • pull a random sample from a DataFrame using .sample()
  • find duplicated entries with .duplicated() and .drop_duplicates()
  • convert string and object data types into numbers with .to_numeric()
  • use plotly to generate pie, donut and bar charts as well as box and scatter plots

Day 6

I learnt how to:

  • create arrays with np.array()
  • generate arrays using .arange(), .random() and .linspace()
  • analyse the shape and dimensions of ndarray
  • slice and subset a ndarray based on its indices
  • do linear algebra like operations with scalars and matrix multiplication
  • use NumPy's broadcasting to make ndarrays shapes compatible
  • manipulate images in the form ndarrays

Day 7

I learnt how to:

  • use nested loops to remove unwanted characters from multiple columns
  • create bubble charts using Seaborn library
  • filter Pandas DataFrame based on multiple conditions using both .loc[] and .query()
  • style Seaborn charts using the pre-built styles and by modifying Matplotlib parameters
  • use floor division to convert years to decades
  • use Seaborn to superimpose a linear regression over our data
  • run regressions with scikit-learn and calculate the coefficients

Day 8

I learnt how to:

  • create a Choropleth to display data on a map
  • create bar charts showing different segments of the data with plotly
  • create Sunburst charts with plotly.
  • use Seaborn's .lmplot() and show best-fit lines across multiple categories using the row, hue, and lowess parameters

Day 9

I learnt how to:

  • use histograms to visualise distributions
  • superimpose histograms on top of each other even when the data series have different lengths
  • use a to smooth out kinks in a histogram and visualise a distribution with a Kernel Density Estimate (KDE)
  • improve a KDE by specifying boundaries on the estimates
  • use scipy and test for statistical significance by looking at p-values
  • highlight different parts of a time series chart in Matplotib
  • add and configure a Legend in Matplotlib
  • NumPy's .where() function to process elements depending on a condition

Day 10

I learnt how to:

  • quickly spot relationships in a dataset using Seaborn's .pairplot()
  • split the data into a training and testing dataset to better evaluate a model's performance
  • run a multivariable regression
  • evaluate that regression-based on the sign of its coefficients
  • analyse and look for patterns in a model's residuals
  • improve a regression model using (a log) data transformation
  • specify your own values for various features and use your model to make a prediction

About

My experience learning data science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published