-
Notifications
You must be signed in to change notification settings - Fork 0
/
data-science-notes.tex
18 lines (17 loc) · 1.07 KB
/
data-science-notes.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Although data science, like statistics, is a way of exploring sets of
information, it distinguishes itself from statistics by focusing more
on providing a wide variety of mechanisms for gaining potential insights
from existing data sets. In addition, to a computer scientist, data
science is an opportunity to think about a progression of ways to
work with data: Starting with a data set that is typically in a
somewhat unstructured, less-than-usable form, one \textit{wrangles}
the data into a usable form, \textit{cleans} the data to remove
less applicable data points, \textit{transforms} the data to other
forms, \textit{merges} the data with other data sets, and finally
\textit{visualizes} or \textit{summarizes} the data. Importantly
one does all these steps programmatically, so that it is possible
to replicate the steps on a new or modified data set.
That is, we taught students that data science is a field in which
you gain insight from existing data and that data science requires
a series of replicable processes carried out by algorithms that data
scientists design and implement.