A comparison of two image classification methods in Python :
- k-nearest neighbors (KNN)
- and artificial neural networks (NN)
The data used to test these methods comes from the CIFAR-10 dataset : https://www.cs.toronto.edu/~kriz/cifar.html
This repository follows the instructions of this tutorial : https://gitlab.ec-lyon.fr/qgalloue/image_classification_instructions
This project requires python3, and common libraries installations :
- Matplotlib for creating visualizations
- NumPy for computing with arrays
- SciPy for fundamental algorithms
- Pickle for Python object serialization
- Pytest for unit tests
This project works with CIFAR-10 data (see the section below to understand the data). If CIFAR-10 data is absent from your local repository, some unit tests will fail.
- Click here to download the data in your project's folder ;
- Unzip the data to a 'data' folder.
You are now good to go.
It follows the PEP8 and PEP257 Recommandations.
You can use Python utilites / libraries such as :
- Black to format code
- isort to sort imports
- Pydocstyle to help you document your code properly
Or you can use VSCode extensions such as :
- Prettier to format code
- autoDocString to help you document your code properly
- Makefile Tools to support Makefiles in VSCode
To run the script contained in main.py simply use :
make run
Then, you will be able to choose which method to use by entering 0, 1 or 2 :
- 0 for unoptimized KNN method (might take some time to compute)
- 1 for optimized KNN method
- 2 for Neural Network
To run unit tests you may use :
make unittest
To visualize test coverage run :
make coverage
Unit tests and test coverage reports are run and created with Pytest.
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
The Makefile groups all usefull commands for this project :
- Running the main.py script
- Executing unit tests
- Checking cove coverage
The main script is the executable part of this project. It can be easily run using the method contained in the Makefile.
Drawing of this project's architectureFunctions used for each classification method are contained in files under the directory modules/.
Some helper functions for example choosing a classification method, choosing the value of the split factor or plotting results are contained in the helper file
Unit tests for each of these functions appear in the tests/ directory under the name test+_[method_name].py.
The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.
KNN works by finding the distances between a query and all the examples in the training data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression).
Principle of K Neirest Neighbors (KNN)Source : https://www.ibm.com/topics/knn
For more references, see K-Nearest Neighbors Algorithm or Machine Learning Basics with the K-Nearest Neighbors Algorithm
Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.
Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.
Principle of Artificial Neural Network (NN)Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster data at a high velocity. Tasks in speech recognition or image recognition can take minutes versus hours when compared to the manual identification by human experts.
Source : https://www.ibm.com/cloud/learn/neural-networks
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
Here are the classes in the dataset, as well as 10 random images from each: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
To learn more about CIFAR data or download data, click the Source link.
Source : https://www.cs.toronto.edu/~kriz/cifar.html
If you want to propose any improvements or need any help, feel free to contribute by opening an issue.
- Learning Multiple Layers of Features from Tiny Images by Alex Krizhevsky
- Download CIFAR datasets by Alex Krizhevsky
- Tutorial to Image Classification by Quentin Gallouédec
- What is the k-nearest neighbors method by IBM
- What are Neural Networks by IBM
This project is licensed under the MIT License.