Skip to content

This GitHub repository contains code that performs analysis on a Walmart stock dataset using Spark, a fast and distributed data processing engine. The code utilizes various Spark functions to explore and manipulate the dataset, and computes statistics to gain insights into the stock's performance.

Notifications You must be signed in to change notification settings

mohankrishna02/Walmart-Stock-Analysis-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Walmart Analysis using Spark

This code performs analysis on a Walmart stock dataset using Spark. It utilizes various Spark functions to explore and manipulate the data. Here is a breakdown of the tasks performed:

Reading the dataset:

  • The code reads a CSV file containing Walmart stock data into a DataFrame.

Data exploration:

  • Printing the number of rows in the DataFrame.
  • Printing the column names.
  • Printing the schema of the DataFrame.
  • Printing the first 5 lines of the DataFrame.
  • Using the describe() function to generate summary statistics of the DataFrame.
  • Formatting the decimal places of the summary statistics to two decimal places.

Data manipulation:

  • Converting certain columns to decimal type.
  • Creating a new column called "HV Ratio" that represents the ratio of the High Price to the volume of stock traded for a day.

Calculating statistics:

  • Computing the mean of the "Close" column.
  • Finding the maximum and minimum values of the "Volume" column.
  • Counting the number of days where the "Close" price was lower than $60.
  • Calculating the percentage of time the "High" price was greater than $80.
  • Finding the maximum "High" price for each year.
  • Calculating the average "Close" price for each calendar month.
  • This code can be used as a starting point for analyzing and gaining insights from the Walmart stock dataset using Spark.

Dataset:

Solution:

Documentation:

About

This GitHub repository contains code that performs analysis on a Walmart stock dataset using Spark, a fast and distributed data processing engine. The code utilizes various Spark functions to explore and manipulate the dataset, and computes statistics to gain insights into the stock's performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published