Modeling

How can we visualize our datasets to help us find answers to the question we were asked
We build data models and use training sets and testing sets to find answers to business question
The success of the modeling largely depends of understanding of the problem or business question, the analytical and methods approach used
For building the data model, we need to compile, prepare and model
Data scientists may train different algorithms on training set and fine tune variables that best support the answer to the question
Model can use statistical process or machine learning
The result of modeling can be Prescriptive (Do something) or Predictive (This is likely to happen)

Using decision tree to classify readmission of a patient
In the example while the Overall accuracy represents highest accuracy, it only represents 45% of the Yes answers
To tune the model, we need to make adjustment to the Relative Cost Y:N which is the ration of Yes to No answers
Type 1 Eror aka False Positive
Type 2 Error aka False Negative
Sensitivity is the Accuracy in respect to Yes answers
Specificity is the Accuracy in respect to No answers

Evaluation

Have we indeed answer the question using the designed model or we need to calibrate the model further
Involves feedback loop to determine relevance and fitness of the model
Evaluations run during development before its deployed
Phases of model evaluation
- Diagnostic measures. Test and verifies that the model work as intended.
  - We can use Decision Tree to find areas for improvement
- Statistical significance
  - Test and verifies that data is properly processed and interpreted in the model.

Which model is best if we are to tune the Relative Cost parameter
We can use Receiver Operating Characteristics (ROC) Curver to evaluate the binary classification models
Model that is farthest than the central line is best, aka maximum separation

()