Welcome to the Data Science Piscine! This repository includes a series of training modules designed to enhance your skills in data science through hands-on exercises. Each module covers a unique aspect of the field, from foundational concepts to advanced predictive modeling techniques.
- datascience-0: Learn to create a PostgreSQL database, emphasizing the significance of data cleaning and preparation for analysis through practical exercises.
- datascience-1: Focus on the creation of a data warehouse using the ETL (Extract, Transform, Load) process, guiding participants in effective data integration, organization, and management practices.
- datascience-2: Explore data visualization techniques, highlighting the role of data analysts in interpreting and making informed decisions based on graphical representations of data.
- datascience-3: Understand current data through exercises involving visualizations, correlation analysis, standardization, normalization, and dataset splitting, all aimed at preparing for predictive modeling.
- datascience-4: Delve into predictive modeling, applying techniques such as confusion matrices, heatmaps, variance calculations, feature selection, decision trees, KNN, and voting classifiers, while adhering to guidelines for software setup and collaborative submission.
- Clone the Repository:
git clone https://github.com/mbrettsc/Data-Science
- Navigate to the Module Directory:
cd <module-directory>
- Follow the Instructions: Each module contains its own set of instructions and exercises. Follow them to complete the tasks.
Contributions to improve or extend the modules are welcome. Please fork the repository, make your changes, and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or issues, please open an issue on the repository or contact the maintainers directly.
Happy coding!