Skip to content
View skotak2's full-sized avatar

Block or report skotak2

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 27,182 5,555 Updated Feb 22, 2025

PySpark operations to analyze the customer reviews of Amazon Twitter. Analyze the most repeated words on the busiest day of the year.

Jupyter Notebook 1 Updated Feb 4, 2021

A research study aiming to discover distinguishable pattern between pre and post diagnosis behavior of self-diagnosed individuals with depression on Twitter

Jupyter Notebook 1 1 Updated Feb 6, 2021

Developed REST API for real-time recommendations based on unsupervised technique - KMeans, where the cluster TF-IDF scores of the reviews and analyze them.

Jupyter Notebook 4 Updated Jan 22, 2021

Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading

Python 1 Updated Jan 15, 2021

Analyse the tweets and perform sentiment analysis using logistic regression

Jupyter Notebook 1 Updated Jan 15, 2021

Prediction of Earning Manipulation for Corporate Firms

R 1 Updated Apr 21, 2020

My attempt at analyzing the tweets in order to understand the social media behavior of individuals suffering from PTSD

Jupyter Notebook 1 Updated May 8, 2020

Implementing a Logistic Regression classifier and identifying the key metrics from the patient data to design a screening test for the early detection of Chronic Kidney Disease

R 1 Updated Feb 12, 2021

Performing a wide array of statistical analysis including Uni-variate & Multivariate analysis, Principal Component Analysis for the identification of key features. Followed by Logistic Regression &…

R 1 Updated Jan 20, 2021

Implementing the "Hello World" of MapReduce in Python

Jupyter Notebook 1 1 Updated Jan 19, 2021

Live streaming of tweets using Spark context and analyzing the sentiment of the tweets

Jupyter Notebook 2 1 Updated Jan 20, 2021

A Naive Bayes model deployed on Tabpy server making predictions on the test instances retrieved from Tableau

1 Updated Nov 18, 2020

Identifying racial bias in arrests made in the stop and frisk program in the New York Police Department

Jupyter Notebook 2 1 Updated Jan 20, 2021

A movie recommendation system designed with ALS algorithm with Matrix factorization on the user ratings data from the movie-lens dataset

Jupyter Notebook 5 2 Updated Jan 15, 2021

Cleansing of data for text mining and finding similarities between documents using Jacard and cosine similarities. And computed TF-IDF coefficeints.

Python 1 1 Updated Feb 4, 2021

Built a supervised multi-class predictive model to bucket customers based on the events and actions recorded during their interactions with the VMWare's customer engagement portals

R 2 Updated Jan 15, 2021

Built an encoder - decoder model for captioning an image with visual attention mechanism. Encoding of image was done with CNN and decoding is done with RNN(GRU & LSTM) based networks.

Jupyter Notebook 3 Updated Feb 5, 2021

Developed REST API to perform machine translation using Seq2Seq model. The model deployment was done using google could platform.

Jupyter Notebook 4 2 Updated Jan 13, 2021

My humble and volunteered attempt at analyzing the pattern in Tweets of one of my cousins and creating insightful Visualizations.

Jupyter Notebook 1 Updated Jan 19, 2021

A research study aiming to discover distinguishable pattern between pre and post diagnosis behavior of self-diagnosed individuals with depression on Twitter

Jupyter Notebook 3 Updated Jan 26, 2021
Showing results