Skip to content

Latest commit

 

History

History
67 lines (58 loc) · 2.06 KB

01-datascience-methodology.md

File metadata and controls

67 lines (58 loc) · 2.06 KB

Notes:

Data Science Methodology

Data Science Methodology guides data scientists in solving complex problems with data. This includes forms of data colleciton, strategies for measurement and methods for comparison.

Case Study

Ten Stages in Data Science

  • Business understanding
  • Analytic approach
  • Data requirements
  • Data collection
  • Data understanding
  • Data preparation
  • Modeling
  • Evaluation
  • Deployment
  • Feedback

Notes

  • Get stakeholder buyin and support

Stage 1: Business Understanding

  • Define: Prepare clearly defined set of questions to help identify the right analytic approach
  • Understand: Understand the goal of the sponsor
  • Objectives: Organize set of clear objectives
  • Engagement: Stakeholder engagement is important in capturing the requirements and clarify the questions
  • what is the problem are we trying to solve?
  • Define goals and objectives
  • Kickoff the project with
  • What is expected out of sponsor
    • Set the project direction
    • Remain engaged and provide guidance
    • Ensure needed support should need arise

Stage 2: Analytic Approach

  • Steps to run analytical approach
    • Identify the patterns
    • Choose an analytical approach
    • Apply machine learning
  • Available patterns to address the questions
    • Descriptive (Current status)
      • What is the current situation?
    • Diagnostic (Statistical Analysis)
      • What happened?
      • Why is this happening?
    • Predictive (Forecasting)
      • What if the trends continue?
      • What will happen next?
    • Prescriptive (Reccommendations)
      • How do we solve it?
Model types
  • Use Descriptive Model to show relationshops
  • Use Predicitive Model to show probabilities of an action
  • Use Classification Model to capture yes/no answers
Machine Learning
  • Learn without being programmed
  • Can identify relationships and trends in data
  • Uses clustering association
Decision Tree Classification
  • A classification outcome is certain
  • A decision path is well represented by describing the conditions leading to high risk
  • Simple to understand and implement