- Seattle, WA
- https://suhaskotaki.netlify.app/
Stars
This is a repo with links to everything you'd ever want to learn about data engineering
PySpark operations to analyze the customer reviews of Amazon Twitter. Analyze the most repeated words on the busiest day of the year.
A research study aiming to discover distinguishable pattern between pre and post diagnosis behavior of self-diagnosed individuals with depression on Twitter
Developed REST API for real-time recommendations based on unsupervised technique - KMeans, where the cluster TF-IDF scores of the reviews and analyze them.
Understand how map reduce works for parsing a text data with parallel processing of sub tasks using multi threading
Analyse the tweets and perform sentiment analysis using logistic regression
Prediction of Earning Manipulation for Corporate Firms
My attempt at analyzing the tweets in order to understand the social media behavior of individuals suffering from PTSD
Implementing a Logistic Regression classifier and identifying the key metrics from the patient data to design a screening test for the early detection of Chronic Kidney Disease
Performing a wide array of statistical analysis including Uni-variate & Multivariate analysis, Principal Component Analysis for the identification of key features. Followed by Logistic Regression &…
Implementing the "Hello World" of MapReduce in Python
Live streaming of tweets using Spark context and analyzing the sentiment of the tweets
A Naive Bayes model deployed on Tabpy server making predictions on the test instances retrieved from Tableau
Identifying racial bias in arrests made in the stop and frisk program in the New York Police Department
A movie recommendation system designed with ALS algorithm with Matrix factorization on the user ratings data from the movie-lens dataset
Cleansing of data for text mining and finding similarities between documents using Jacard and cosine similarities. And computed TF-IDF coefficeints.
Built a supervised multi-class predictive model to bucket customers based on the events and actions recorded during their interactions with the VMWare's customer engagement portals
Built an encoder - decoder model for captioning an image with visual attention mechanism. Encoding of image was done with CNN and decoding is done with RNN(GRU & LSTM) based networks.
Developed REST API to perform machine translation using Seq2Seq model. The model deployment was done using google could platform.
My humble and volunteered attempt at analyzing the pattern in Tweets of one of my cousins and creating insightful Visualizations.
A research study aiming to discover distinguishable pattern between pre and post diagnosis behavior of self-diagnosed individuals with depression on Twitter