The state of data science methodology that covers identifying, sourcing, understanding and preparing the required data for further analysis.
- Data that may skew the results are excluded in the sample dataset
- Identifying correct and required data content, formatrs and sources to support the selected analytical approach
- Initial data collecton steps and may revise the requirements depending on availability of data
- Gather availale data related to case under study
- We can defer data that is not available at the moment and take it in later
- Systematic and meticuluous preparation of data to ensury right quantity and quality
- Use statistics and visualization tools to assess fitness
- Assess data quality issues such as missing data and other anomalies
- Eliminate redudnant data and prepare for next stage of analysis