Skip to content

Jonnius00/ExploringPandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 Pandas Dataset Explorer

This project is designed to help you systematically explore and analyze datasets using Python and pandas, following the structure of the Real Python tutorial “Using pandas and Python to Explore Your Dataset.” It provides scripts, notebooks, and examples that demonstrate:

  • 📥 Environment setup

    • Install Python 3, pandas, matplotlib (and optionally Jupyter/Anaconda).
    • Sample install commands using pip or conda.
  • 🧰 Data ingestion

    • Use pd.read_csv() (or read_json(), read_html(), etc.) to load data.
    • Example uses real-world data: e.g. NBA results CSV.
  • 🔍 Initial data inspection

    • Use .head(), .tail(), .info(), .shape, and len() to get a quick overview.
    • Adjust display settings (display.max.columns, display.precision) for better visibility.
  • 🗂️ Data structure understanding

    • Explore Series and DataFrame basics.
    • Compare indexing methods: bracket [], .loc, .iloc.
  • Querying & filtering

    • Filter rows via conditions (e.g., df[df["col"] > X]).
    • Use .loc, .iloc to select specific rows/columns.
  • 📊 Grouping & aggregating

    • Summarize data using .groupby(), .sum(), .mean(), .count().
    • Combine datasets (concat, merge) when working with multiple sources.
  • 🧼 Cleaning & casting

    • Detect and handle missing/inconsistent/invalid values.
    • Convert types (df["col"] = df["col"].astype(...)) as needed.
  • 📈 Visualization

    • Use pandas' built-in .plot() (histograms, scatter, bar, etc.) to visualize distributions, trends, and categories.
    • Leverage matplotlib integration within Jupyter or standalone scripts.

📁 What's Included

  • environment_setup/ – shell scripts and instructions to configure your Python environment.
  • data/ – sample datasets (e.g. NBA ELO, FiveThirtyEight, etc.).
  • notebooks/ – Jupyter notebooks illustrating each key step:
    1. Overview & loading
    2. Inspection & display settings
    3. Indexing & selection
    4. Querying & filtering
    5. GroupBy and aggregation
    6. Cleaning & typing
    7. Merging datasets
    8. Visualizing data
  • scripts/ – Python files that reproduce key tasks outside Jupyter.
  • requirements.txt – minimal dependencies (pandas, matplotlib, Jupyter optional).
  • README.md – (this file).

📝 How to Use

  1. Clone the repo
    git clone <repo-url>
    cd <repo-folder>
  • The virtual environment (pandas_env/) is included for convenience.
  • Outputs
  1. Set up your environment

    pip install -r requirements.txt
    # or
    conda install pandas matplotlib jupyter
  2. Run a nootbook

    jupiter notebook
  3. Explore!

  • Follow the notebooks step-by-step to learn:
  • Inspecting your data with .info(), .head(), .shape, .describe()
  • Subsetting using .loc, .iloc, filtering expressions
  • Grouping and summarizing by category
  • Cleaning missing and inconsistent entries
  • Visualizing distributions and relationships with .plot()

🎯 Learning Outcomes

  • By the end of this project, you’ll be able to:
  • Load data from multiple formats into pandas
  • Understand core data structures: Series & DataFrame
  • Access and filter data efficiently
  • Aggregate and group information to extract insights
  • Clean data and prepare it for analysis
  • Create visualizations that highlight key patterns
  • Combine multiple datasets for comprehensive analysis

📚 References

Reka Horvath (2020, January 06). Using pandas and Python to Explore Your Dataset Real Python. https://realpython.com/pandas-python-explore-dataset/

About

my implementation of the RealPython tutorial teaching how to deal with datasets using Pandas

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published