Skip to content

MichiganDataScienceTeam/MDST-Onboarding

Repository files navigation

header

MDST Tutorials - FA24

Check out our onboarding website with centralized resources here!

Our FALL 24 Project Lists here!

If there are any issues or areas of improvement you would like us to know, please create a new entry in "Issues"

Setup

If you haven't already, fill out this form and join our mailing list. This will keep you up-to-date on the club.

  1. Download the files in this repo by clicking Code (the green button near the top) -> Download ZIP and unzip the files into a folder. You can of course also fork the repo if you have experience with Git.

  2. Follow the general setup guide.

  3. Follow the Git setup guide.

For most people, (3) is the hardest part of the tutorial! If you feel frustrated, know it is normal. Come see us at tutorials or office hours and we will help you out.

What do I do if I cannot get the setup working in time?

If you have trouble with the General Setup, you can follow the Google Colab setup guide and use Colab to complete the tutorials.

If you have trouble with the Git Setup, you can upload your files to Git by going to your GitHub repository and do Add file -> Upload files.

Tutorials & Checkpoints

Get started with tutorial0 and checkpoint0 in the tutorial0 folder and then move on to tutorial1 and checkpoint1 in the tutorial1 folder. We recommend working through each tutorial before attempting the corresponding checkpoint. However, if you have prior experience, feel free to skip part of or entire tutorial.

The Data-Visualization folder contains materials for those who want to get a head start. pandas.ipynb is a very brief introduction to internal Pandas data visualization tools. The AnatomyofMatplotlib folder contains a comprehensive tutorial for the Matplotlib library, which most beginner projects use and is foundational to other data visualization packages such as seaborn.

We also highly recommend you looking into Python virtual environments. You can do this at the beginning or after you complete the checkpoints. Our members have made resources explaining it here.

Challenges (Optional)

There are three optional challenges available to you: Machine Learning, Deep Learning, and RvF. They are located in three seperate folders under Optional-Challenges and your code will be needed in the notebooks ending in .ipynb.

You can choose to complete any one or multiple of them. We usually put new members on beginner or intermediate projects for their very first semester but you may want to work on advanced projects right away if you are experienced with data science. In that case, completion of at least one challenge will be required.

Themes:

Machine Learning - Loan Approval Prediction

Deep Learning - Titanic

RvF - Computer Vision: Fake Face Detection

How we are supporting you

These checkpoints are not meant to be selective. Their sole purpose is to give you sufficient foundational knowledge about Python and some important packages so you can start contributing to a project.

The definition of success for us is to have everyone who begins the tutorials finish them. Thus, we will offer support in two ways:

  • Tutorials: We will host one live tutorial introducing the Command Line, Python, and its related packages. The session will be a combination of short presentations and Q&A.

  • Office Hours: We will be offering 2 office hours in-person for you to come ask questions and receive feedbacks.

The exact date, time, and location of the tutorials and office hours can be found on the onboarding resource website here

Neither tutorials nor office hours are mandatory.

We have also created a forum where you can ask questions.

Submission

Due: 9/16/2024 11:59pm EST

Submit a link to your GitHub repo that contains all your completed work when you sign up for projects: project signup form

We are looking for:

  • [REQUIRED] Checkpoint 0 and Checkpoint 1. These are assessed by completion and effort, not accuracy.
  • [OPTIONAL] Any additional challenges you completed. These are assessed by merit.

Please make sure you link works. Due to the immense application volume we will not be sending out emails if your link is invalid. They will be automatically filtered.

Contact

All technical or logistical questions MUST be posted on the ED forum. We will not answer those questions over email.

If you have a personal question, email us at [email protected].

Official Documentations

A list of relevent python libraries that are used extensively throughout the checkpoints, challenges, MDST projects, and beyond.

Numpy: https://numpy.org/doc/stable/

Pandas: https://pandas.pydata.org/docs/

Matplotlib: https://matplotlib.org/stable/gallery/index

Scikit-Learn: https://scikit-learn.org/stable/user_guide.html