Skip to content
View pari1jay's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report pari1jay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pari1jay/README.md

Hi, I'm Pari! 👋

Profile view counter on GitHub

Data Analyst

🔗 Portfolio | 🔗 LinkedIn


About Me

I'm a Data Analyst with experience in data engineering, system integration, and cloud-based solutions. I have a Master of Science degree in Applied Data Science from Indiana University, and I am passionate about data analytics, AI and machine learning. I'm actively seeking opportunities to work on impactful projects as a Data Analyst or Data Engineer.


Technical Skills 💻

Data Analysis & Engineering

  • Languages: Python, R, SQL, Java, C
  • Databases: PostgreSQL, MySQL, MongoDB, Snowflake, MS Access
  • Data Visualization: Tableau, Power BI, Plotly, Excel

Project Management & Collaboration

  • Tools: Jira, Confluence, Lucidchart, MS Project , HP ALM
  • Methodologies: Agile, Scrum, Waterfall

Certifications

  • Career Essentials in Data Analysis by Microsoft
  • Microsoft Azure Data Fundamentals
  • Data Analytics with Microsoft Fabric
  • HackerRank SQL, R (Intermediate)
  • Atlassian Agile Project Management Professional

Projects 🚀

1. Efficacy Prediction Model

  • Check Efficacy using a pre-processed dataset (CA, CM, CI classes) from Moleculenet.ai

  • Merge Data: Link NSC across files to combine screening results, EC50/IC50, and structures.

  • Filter Compounds: Focus on CA/CM for active candidates.

  • Calculate Selectivity Index (SI): SI = IC50/EC50 to identify compounds with high efficacy and low toxicity.

  • Data preprocessing :

    • Manage duplicate entries,
    • Mismatched screening conclusions,
    • flag interpretation sign to values and
    • Handle missing data.
  • ML model: performing random splitting (80% train, 20% test).

  • Extracted molecular descriptors (e.g., logP, Morgan Fingerprints, MORSE) from data,

  • training base models, check with test data/.

  • Evaluated models using accuracy, F1-score, and Cohen’s kappa, aligning predictive insights with clinical research objectives.

2. MULTI-CLASS GENRE CLASSIFICATION using R Link

  • Automatic genre classification has long captivated researchers in Music Information Retrieval (MIR), seeking techniques to unravel the musical diversity.
  • audio feature extraction and music genre classification by utilizing Spotify's rich array of audio features and a diverse dataset.
  • Few other projects exploring concepts in R Link

3. Consumer Complaints Prediction

  • Tools: Python, NLP, Data Visualization
  • Description: Applied NLP techniques to analyze customer feedback and classify sentiment as positive, negative, or neutral. Achieved 79% accuracy using machine learning models (Naive Bayes, Decision Tree, KNN).

4. Real Estate Sales Prediction Web Application

  • Tools: Python, Machine Learning, Streamlit
  • Description: Developed a web app to predict real estate sales using Linear Regression, Random Forest, and Gradient Boosting. Enabled city-specific and overall sales predictions with user input.

5. ETL and Data Pipelines with Shell, Airflow and Kafka

  • Tools: Shell, Airflow and Kafka
  • Description: Designed and implemented ETL pipelines to integrate data from multiple sources into a centralized data warehouse, improving data quality by 25%.
  • Coursera: Link

Experience 💼

Data Engineer/Data Analyst | Netcube Technologies | Bangalore, India | Jan 2019 – Feb 2022

  • Tools: SQL, GCP, Apache Airflow, GitHub, Restful APIs, Flask, ETL/ELT , SQL, NoSQL, Data warehouses

Associate Software Engineer | Tech Mahindra | Bangalore, India | Aug 2016 – Oct 2018

  • Tools: Oracle DB, HP ALM, Python, Automation testing scripts, Data warehouses

Education 🎓

  • Master of Science in Applied Data Science | Indiana University Indianapolis | Jan 2023 – May 2024

    • Coursework: Data Analytics using Python and R, Data Visualization, Deep Learning, Cloud Computing, DBMS, Statistics
    • Dean’s Scholarship Recipient
  • Bachelor of Engineering | Mangalore Institute of Technology and Engineering, VTU, India


Let's Connect! 🌐

I'm open to collaborating on interesting projects or discussing new opportunities. Feel free to reach out!

Quote

Pinned Loading

  1. Sales-Prediction-using-ML Public

    The project is on developing a sales prediction Web app using Texas housing dataset('txhousing'). The goal here is to provide insights into real estate sales trends using this dataset. I have used …

    Jupyter Notebook 1 1

  2. Crop-row-detection Public

    Developed a deep learning model in Python to detect crop rows from input images, utilizing U-Net architecture with TensorFlow for image segmentation. Evaluated model performance using the Intersect…

    Jupyter Notebook 1

  3. Customer-sentiment-Analysis Public

    This project focuses on analyzing customer sentiment based on textual data, such as product reviews, feedback, or social media posts. The goal is to classify customer feedback into different sentim…

    Jupyter Notebook 1

  4. Spotify-classification-R Public

    Exploring Audio Features and Genre Classification for Spotify data

    1

  5. Drug-Efficacy-Prediction-Model- Public

    Jupyter Notebook

  6. Midwest-dataset-project-using-R Public

    In this project, I aim to conduct a comprehensive analysis of demographic and socioeconomic data for counties in the Midwest region of the United States. The dataset, provides information on variou…

    R

90 contributions in the last year

Contribution Graph
Day of Week April May June July August September October November December January February March
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Less
No contributions.
Low contributions.
Medium-low contributions.
Medium-high contributions.
High contributions.
More

Contribution activity

April 2025

Created 3 repositories
Loading