Skip to content

Latest commit

 

History

History
40 lines (28 loc) · 2.58 KB

README.md

File metadata and controls

40 lines (28 loc) · 2.58 KB

Welcome to My GitHub: DATA SCIENCE

(https://datute.net/) homepagepic

Data science

"Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured."

"Data science is a 'concept to unify statistics, data analysis, machine learning and their related methods' in order to 'understand and analyze actual phenomena' with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, information science, and computer science." -- Wikipedia (November 13, 2018)

Big data

bigdata

The term “big data” was first corned in 1997 by the NASA astronomers Michael Cox and David Ellsworth regarding the big quantity of information generated by the supercomputers, which was published in the Proceedings of the IEEE 8th Conference entitled “Application-controlled demand paging for out-of-core visualization” from the ACM digital library.

"Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation." -- Gartner (2012)

Accordingly, big data have three defining properties/dimensions including (1) volume (quantity), (2) variety (types: structured, semi-structured and unstructured) and (3) velocity (streaming data with speed). The variety of big data implies any of the following types:

  • Structured data: RDBMS data, easily retrieved through SQL.
  • Semi-structured data: data in files (xml, json docs, NoSQL database).
  • Unstructured data: images, videos, text files etc.
  • Data Analytics

    Data processing and analytics include building and training machine learning models, manipulating data with technology, extracting information from data as well as building data tools, applications, and services. It may consist of the following major steps (Big Data Science & Analytics, 2016):

  • Framing the problem
  • Data acquisition for the problem
  • Data wrangling
  • Machine learning
  • Developing a statistical/mathematical model
  • Data visualization
  • Communicating the output of the analysis: (1) data report, and (2) data products.