Optimizing-Online-Sports-Retail-Revenue

About the project

This project is a Guided Project from DataCamp (the data is already prepared and ready to use). It is about analyzing an online sports retail company data. Its main goal is to answer some questions related to the store's products and their prices, brands, revenue, ratings and number of reviews.

About the dataset

This project's dataset is divided into 5 csv files:

Info contains product id with its name and description.
Finance contains product id with its listing price, sale price, discount, and revenue.
Reviews contains product id with its rating and number of reviews.
Traffic contains product id and it last visited date on the website.
Brands contains product id and its brand.

All files have the product id as a primary key

The main strategy used in this project

First, I've created a MySQL database in which data was stored. Then I've established a connection between MySQL server and Python. After that I've executed all queries in Python and retrieved the required data. Finally I've done the analysis and visualization using Python's libraries and packages.

Python Libraries and Packages

1. Main Libraries and Packages:

Pandas to deal with dataframes and perform data analysis
Numpy to get statistical information
Matplotlib and Seaborn for visualization
mysql.connector to deal with the database

2. Secondary Libraries and Packages:

load_dotenv and os to deal with variable environment
HTML and warnings to customize the output plots

Phases of the project

1. Database Preparation

Creating a database with the help of MySQL Server
```
CREATE DATABASE sports_store;
```

Creating the tables (a table for each csv file)

CREATE TABLE info(
    product_name VARCHAR(100),
    product_id VARCHAR(15) PRIMARY KEY,
    description VARCHAR(500)
);

CREATE TABLE finance(
    product_id VARCHAR(15) PRIMARY KEY,
    listing_price DECIMAL(6, 2),
    sale_price DECIMAL(6, 2),
    discount DECIMAL(2, 2),
    revenue DECIMAL(8,2)
);

CREATE TABLE reviews(
    product_id VARCHAR(15) PRIMARY KEY,
    rating DECIMAL(2, 1),
    reviews INT
);

CREATE TABLE traffic(
    product_id VARCHAR(15),
    last_visited DATETIME
);

CREATE TABLE brands(
    product_id VARCHAR(15),
    brand VARCHAR(15)
);

Importing data into these tables

Using the following lines for all files:

LOAD DATA LOCAL INFILE 'path\file_name.csv'
INTO TABLE table_name
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Example:

LOAD DATA LOCAL INFILE 'path\info.csv'
INTO TABLE info
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Final database structure:

2. Data Exploration

show_columns:

def show_columns(table):
    '''
    This function simply shows the columns of the table specified
    :param table -> the name of the table that you want to show its columns
    '''
    cursor.execute("""SHOW COLUMNS FROM {};""".format(table))
    columns = cursor.fetchall()
    print("Columns of {} table are:".format(table))
    for number, column in enumerate(columns):
        print(str(number+1) + ") " + column[0], end="\n")

show_first_ten_results:

def show_first_ten_results(table):
    '''
    This function shows the first 10 products of the table specified
    :param table -> the name of the table that you want to show its first 10 results
    '''
    cursor.execute("""SELECT * FROM {} ORDER BY product_id LIMIT 10;""".format(table))
    columns = cursor.column_names
    results = cursor.fetchall()
    df = create_dataframe(columns, results) # To return resutls in the form of a dataframe
    return df

Using these 2 functions, I can dig more into the data and understand what each column in each table exactly represents.

3. Data Analysis

This phase was done using a hybrid strategy; the analysis was done in SQL and the queries are get executed using MySQL/Python Connector, and the visualization was done using Matplotlib and Seaborn libraries in Python.

This phase can be divided into some steps:

Data extraction:

I've used the following SQL concepts:
- Simple queries (SELECT ... FROM ... WHERE)
- CASE ... WHEN ... THEN
- GROUP BY & ORDER BY
- JOINs
- CTEs and Subqueries
- Some analytical functions
Data transformation:
- Changing data types
- Dealing with datetime formats
Data Analysis:
- Understand the results and extracting insights

4. Data Visualization

This phase depends on the previous phase as we just visualize the data that we've extracted to get more patterns and understand more insights.

The price categories which causes the highest revenue for both brands Adidas and Nike

Average and Expensive price categories achieve more revenue than Elite which is the most expensive products.

How the number of reviews affects the revenue ?

Revenue is increasing with the number of reviews which says that people buy the product which has a greater number of reviews then the revenue increases.

How does the number of reviews change along the months of the year ?

The pecentage of footwears and the other clothing products in the store

The store sells footwears (85%) more than any other piece of clothing (15%).

The website's number of visits along the quarters of the year

Along the quarters of the year, and for both brands Adidas and Nike, the first quarter (first 3 months of the year) has a greater number of visits for products than the other quarters.

Female products VS. Male products for both Adidas and Nike

For both Adidas and Nike, this store has products for men more than for women.

Insights

In addition to the previous insights, the following insights have been gathered during the analysis phase (they're explained in the notebook without using visuals).

The store has much more Adidas products than Nike products.
For all products, Adidas has an average discount of 33.45% while Nike has zero!
The top 2 best selling products are Air Jordan 10 Retro and Nike Zoom Fly from Nike, and they were last visited on the website in 2019 and 2018 respectively.
Product's description length doesn't affect the rating of the product that much.
Adidas pieces are much pricier than Nike.
Because of the first point, Adidas has much greater number of reviews than Nike along the months of the year.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Codes		Codes
Dataset		Dataset
Visuals		Visuals
.gitignore		.gitignore
README.md		README.md
database structure.png		database structure.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing-Online-Sports-Retail-Revenue

Table of Contents

About the project

About the dataset

The main strategy used in this project

Python Libraries and Packages

1. Main Libraries and Packages:

2. Secondary Libraries and Packages:

Phases of the project

1. Database Preparation

2. Data Exploration

3. Data Analysis

4. Data Visualization

Insights

About

Releases

Languages

mennamamdouh/Optimizing-Online-Sports-Retail-Revenue

Folders and files

Latest commit

History

Repository files navigation

Optimizing-Online-Sports-Retail-Revenue

Table of Contents

About the project

About the dataset

The main strategy used in this project

Python Libraries and Packages

1. Main Libraries and Packages:

2. Secondary Libraries and Packages:

Phases of the project

1. Database Preparation

2. Data Exploration

3. Data Analysis

4. Data Visualization

Insights

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages