You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Will eventually add a Data sets section in the intro chapter so we can avoid having to introduce data sets throughout the book (e.g., those used in examples); we don't have to do this for new data sets used in case studies and exercises, etc.
Great candidate for visualization chapter. Can aggregate outcome to estimate/plot proportions within unique values of other features (e.g., lead_time and deposit_type).
The 2021 GSS data are based on a probability sample and could be used for analyzing contingency tables.
Employee attrition data seem hard to come by, so we could treat the IBM HR attrition data as a sample (or take a sample thereof) for analyses (e.g., ordinal association, etc.).
Missing values:
The 2021 GSS data contains lots of missing values.
The text was updated successfully, but these errors were encountered:
Will eventually add a Data sets section in the intro chapter so we can avoid having to introduce data sets throughout the book (e.g., those used in examples); we don't have to do this for new data sets used in case studies and exercises, etc.
Binary outcomes:
pay_0
as single feature (shows good probability spread from 0-1 across predictor space)lead_time
anddeposit_type
).deposit_type
(try it w/ simple LR model), which could make for good discussion about role of SME in identifying potential issues with the ETL process. Some discussion on Kaggle at https://www.kaggle.com/code/marcuswingen/eda-of-bookings-and-ml-to-predict-cancelations.resources/articles/yeh-2009-uciblood.pdf
Multinomial (i.e., polytomous) outcomes:
Ordinal outcomes:
Counts:
Inference:
Missing values:
The text was updated successfully, but these errors were encountered: