“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
For many complex problems e.g self-diriing cars, maintaining a long list of rules is pretty hard to maintain.
## Brief History - Why now?
| |
| 1805 | French mathematician Adrien-Marie Legendre publishes the least square method for regression, which he used to determine, from astronomical observations, the orbits of bodies around the sun. Although this method was developed as a statistical framework, it would provide the basis for many of today’s machine-learning models |
| 1958 | American psychologist and computer scientist Frank Rosenblatt creates the perceptron algorithm, an early type of artificial neural network (ANN), which stands as the first algorithmic model that could learn on its own. American computer scientist Arthur Samuel would coin the term “machine learning” the following year for these types of self-learning models (as well as develop a groundbreaking checkers program seen as an early success in AI). |
| 1965 | Ukrainian mathematician Alexey Grigorevich Ivakhnenko develops the first general working learning algorithms for supervised multilayer artificial neural networks (ANNs), in which several ANNs are stacked on top of one another and the output of one ANN layer feeds into the next. The architecture is very similar to today’s deep-learning architectures. |
| 1986 | American psychologist David Rumelhart, British cognitive psychologist and computer scientist Geoffrey Hinton, and American computer scientist Ronald Williams publish on backpropagation, popularizing this key technique for training artificial neural networks (ANNs) that was originally proposed by American scientist Paul Werbos in 1982. Backpropagation allows the ANN to optimize itself without human intervention (in this case, it found features in family-tree data that weren’t obvious or provided to the algorithm in advance). Still, lack of computational power and the massive amounts of data needed to train these multilayered networks prevent ANNs leveraging backpropagation from being used widely |
| | |
| 1991 | The European Organization for Nuclear Research (CERN) begins opening up the World Wide Web to the public |
| 1992 | Computer engineers Bernhard E. Boser (Swiss), Isabelle M. Guyon (French), and Russian mathematician Vladimir N. Vapnik discover that algorithmic models called support vector machines (SVMs) can be easily upgraded to deal with nonlinear problems by using a technique called kernel trick, leading to widespread usage of SVMs in many natural-language-processing problems, such as classifying sentiment and understanding human speech. |
| 2004 | With the World Wide Web taking off, Google seeks out novel ideas to deal with the resulting proliferation of data. Computer scientist Jeff Dean (current head of Google Brain) and Google software engineer Sanjay Ghemawat develop MapReduce to deal with immense amounts of data by parallelizing processes across large data sets using a substantial number of computers |
| 2004 | Web 2.0 refers to the shifting of the Internet paradigm from passive content viewing to interactive and collaborative content creation, social media, blogs, video, and other channels. Publishers Tim O'Reilly and Dale Dougherty popularize the term, though it was coined by designer Darcy DiNucci in 1999.
Harvard student Mark Zuckerberg and team launch “Thefacebook,” as it was originally dubbed. By the end of 2005, the number of data-generating Facebook users approaches six million |
| 2005 | Within about 18 months, the site would serve up almost 100 million views per day |
| 2006 | Inspired by Google’s MapReduce, computer scientists Doug Cutting and Mike Cafarella develop the Hadoop software to store and process enormous data sets. Yahoo uses it first, to deal with the explosion of data coming from indexing web pages and online data. |
| 2007 | Apple cofounder and CEO Steve Jobs introduces the iPhone in January 2007. The total number of smartphones sold in 2007 reaches about 122 million. The era of around-the-clock consumption and creation of data and content by smartphone users begins |
| 2007 | Nvidia’s Compute Unified Device Architecture (CUDA), a parallel computing platform, enabled developers to use GPUs for processing tasks outside of computer graphics, for which GPUs were initially designed. This allowed researchers and developers to leverage GPUs for machine-learning tasks |
| 2009 | Developed by Romanian-Canadian computer scientist Matei Zaharia at UC Berkeley’s AMPLab, Spark streams huge amounts of data leveraging RAM, making it much faster at processing data than software that must read/write on hard drives. It revolutionizes the ability to update big data and perform analytics in real time. |
| 2010 | Internet protocol (IP) traffic is aided by the growing adoption of broadband, particularly in the United States, where adoption reaches 65 percent, according to Cisco, which reports this monthly figure and the annual figure of 242 exabytes. |
| 2011 | IBM’s question answering system, Watson, defeats the two greatest Jeopardy! champions, Brad Rutter and Ken Jennings, by a significant margin. IBM Watson uses ten racks of IBM Power 750 servers capable of 80 teraFLOPS (that’s 80 trillion FLOPS—the state of the art in the mid-1960s was around three million FLOPS). |
| 2012 | The amount of data processed by the company’s systems soars past 500 terabytes. |
| 2014 | As of October 2014, GSMA reports the number of mobile devices at around 7.22 billion, while the US Census Bureau reports the number of people globally at around 7.20 billion |
| 2017 | According to this estimate, about 90 percent of the world’s data were produced in the past two years. And, every minute, YouTube users watch more than four million videos and mobile users send more than 15 million texts |
| 2017 | Google first introduced its tensor processing unit (TPU) in 2016, which it used to run its own machine-learning models at a reported 15 to 30 times faster than GPUs and CPUs. In 2017, Google announced an upgraded version of the TPU that was faster (180 million teraFLOPS—more when multiple TPUs are combined), could be used to train models in addition to running them, and would be offered to the paying public via the cloud. TPU availability could spawn even more (and more powerful and efficient) machine-learning-based business applications. |
| 2020 | In May 2020, OpenAI researchers describe the creation of the third generation of the GPT (generative pre-trained transformer) language model, which increases its capacity for natural language generation by more than two magnitudes over its predecessor, GPT-2 (released in 2018). In June 2020, OpenAI gives limited access to select users to test the limits of the new model. According to these users, GPT-3 is capable of creating near-human written content, such as new articles and code to develop a website, with a minimal number of user prompts. |
\1. Hastie, Tibshirani, and whoever. Chapters 1-4 and 7.
\2. Orelliey Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
\3. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, 3rd Edition