MindMetrics: Mental Health Analytics of Tweets

This work is part of an IEEE research paper submission and is intended solely for academic research purposes. Use or redistribution without proper permission is prohibited.

This repository introduces a cutting-edge, modular framework for classifying tweets into mental health categories using advanced Natural Language Processing (NLP) and Machine Learning (ML) techniques. It serves as both an educational resource and a robust baseline for computational mental health research.

Key Highlights

Diverse Model Architectures:
Leverage a range of modern techniques including transformer models (BERT, DistilBERT, XLNet, GPT-2), zero-shot classification, topic modeling (LDA, NMF), and a rule-based classifier. This variety supports in-depth comparative analysis between contemporary and conventional methods.
Comprehensive End-to-End Pipeline:
The framework spans from raw data ingestion and rigorous preprocessing to model evaluation and result visualization, offering a complete walkthrough of an ML workflow.
Reproducible & Extensible Research:
With clearly defined modules and scripts, researchers can easily replicate results, benchmark performance, and extend the pipeline for custom experiments.
Advanced Technical Capabilities:
Incorporates techniques such as GPU acceleration, mixed precision training, and extensive metric evaluations (ROC, Precision-Recall curves, classification reports) to offer deeper insights into model behavior.
Real-World Application:
Provides practical strategies for handling noisy social media data, including sophisticated preprocessing, feature engineering, and robust evaluation methodologies.

Repository Structure

Data Preparation

Raw Data:
The primary dataset (mental.csv) consists of unprocessed tweets.
Preprocessing Scripts:
Clean and normalize text data by removing noise (e.g., mentions, URLs, digits, non-alphabetic characters), followed by tokenization, lemmatization, and stopword filtering.

Modeling and Classification

Transformer-Based Models:
Fine-tune and deploy BERT, DistilBERT, XLNet, and GPT-2 for high-performance text classification.
Zero-Shot Classification:
Utilize transfer learning to predict labels without requiring explicit training on every category.
Topic Modeling:
Implement Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) to reveal latent structures and cluster similar content.
Rule-Based Approach:
Employ a simple yet effective classifier based on token occurrence for initial categorization.

Evaluation and Analysis

Performance Metrics:
Measure model effectiveness using accuracy, precision, recall, and F1 scores.
Visualization Tools:
Generate ROC and Precision-Recall curves, histograms, word clouds, and n-gram analyses to visually interpret data insights.
Comparative Analysis:
Conduct detailed, side-by-side comparisons of model outputs with comprehensive classification reports.

Technical Requirements

Programming Language:
Python (3.7+)
Development Environment:
Google Colab or any Jupyter Notebook setup with GPU support.
Core Libraries:
- Data Handling: pandas, numpy
- Visualization: matplotlib, seaborn, WordCloud
- NLP & ML: nltk, scikit-learn, transformers, tensorflow, torch
API Integrations:
Support for API-driven classification via OpenAI and Google Generative AI.

Conclusion

This repository is a valuable asset for researchers, data scientists, and practitioners eager to explore and apply advanced NLP techniques to mental health analytics. By engaging with the codebase, you will enhance your understanding of state-of-the-art classification methods, model benchmarking, and the complexities of processing real-world textual data.

Contributing

Contributions to this repo are welcome. Please open an issue or submit a pull request if you have suggestions for:

New feature extraction and data preprocessing techniques.
Additional machine learning or deep learning models.
Enhanced visualization and interpretability tools.

Before using or modifying the project, please note that it is part of an IEEE research paper submission and should not be used without explicit permission.

License

This project is licensed under the MIT License.

Contact

For questions, collaborations, or further details, please reach out:

Name: Shib Kumar
Email: [email protected]
GitHub: @shib1111111

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classified_tweet_Bert.csv		classified_tweet_Bert.csv
classified_tweet_DistilBERT.csv		classified_tweet_DistilBERT.csv
classified_tweet_GPT2.csv		classified_tweet_GPT2.csv
classified_tweet_XLNet.csv		classified_tweet_XLNet.csv
classified_tweet_basic.csv		classified_tweet_basic.csv
classified_tweet_gemini_10.csv		classified_tweet_gemini_10.csv
classified_tweet_gemini_15.csv		classified_tweet_gemini_15.csv
classified_tweet_lda.csv		classified_tweet_lda.csv
classified_tweet_nmf.csv		classified_tweet_nmf.csv
classified_tweet_zeroshot.csv		classified_tweet_zeroshot.csv
cleaned_mental.csv		cleaned_mental.csv
mental.csv		mental.csv
mental_illness_classification_all_Model.ipynb		mental_illness_classification_all_Model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MindMetrics: Mental Health Analytics of Tweets

Key Highlights

Repository Structure

Data Preparation

Modeling and Classification

Evaluation and Analysis

Technical Requirements

Conclusion

Contributing

License

Contact

About

Releases

Packages

Languages

License

shib1111111/MindMetrics

Folders and files

Latest commit

History

Repository files navigation

MindMetrics: Mental Health Analytics of Tweets

Key Highlights

Repository Structure

Data Preparation

Modeling and Classification

Evaluation and Analysis

Technical Requirements

Conclusion

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages