Image Classification

A comparison of two image classification methods in Python :

k-nearest neighbors (KNN)
and artificial neural networks (NN)

The data used to test these methods comes from the CIFAR-10 dataset : https://www.cs.toronto.edu/~kriz/cifar.html

This repository follows the instructions of this tutorial : https://gitlab.ec-lyon.fr/qgalloue/image_classification_instructions

Installation

Requirements

This project requires python3, and common libraries installations :

Matplotlib for creating visualizations
NumPy for computing with arrays
SciPy for fundamental algorithms
Pickle for Python object serialization
Pytest for unit tests

This project works with CIFAR-10 data (see the section below to understand the data). If CIFAR-10 data is absent from your local repository, some unit tests will fail.

Click here to download the data in your project's folder ;
Unzip the data to a 'data' folder.

You are now good to go.

Development Environment

It follows the PEP8 and PEP257 Recommandations.

You can use Python utilites / libraries such as :

Black to format code
isort to sort imports
Pydocstyle to help you document your code properly

Or you can use VSCode extensions such as :

Prettier to format code
autoDocString to help you document your code properly
Makefile Tools to support Makefiles in VSCode

Usage

To run the script contained in main.py simply use :

make run

Then, you will be able to choose which method to use by entering 0, 1 or 2 :

0 for unoptimized KNN method (might take some time to compute)
1 for optimized KNN method
2 for Neural Network

To run unit tests you may use :

make unittest

To visualize test coverage run :

make coverage

Unit tests and test coverage reports are run and created with Pytest.

Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.

Architecture

The Makefile groups all usefull commands for this project :

Running the main.py script
Executing unit tests
Checking cove coverage

The main script is the executable part of this project. It can be easily run using the method contained in the Makefile.

Drawing of this project's architecture

Functions used for each classification method are contained in files under the directory modules/.

Some helper functions for example choosing a classification method, choosing the value of the split factor or plotting results are contained in the helper file

Unit tests for each of these functions appear in the tests/ directory under the name test+_[method_name].py.

Principles

k-Nearest Neighbors Algorithm

The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.

KNN works by finding the distances between a query and all the examples in the training data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression).

Principle of K Neirest Neighbors (KNN)

Source : https://www.ibm.com/topics/knn

For more references, see K-Nearest Neighbors Algorithm or Machine Learning Basics with the K-Nearest Neighbors Algorithm

Neural Networks

Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.

Artificial neural networks (ANNs) are comprised of a node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network.

Principle of Artificial Neural Network (NN)

Neural networks rely on training data to learn and improve their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy, they are powerful tools in computer science and artificial intelligence, allowing us to classify and cluster data at a high velocity. Tasks in speech recognition or image recognition can take minutes versus hours when compared to the manual identification by human experts.

Source : https://www.ibm.com/cloud/learn/neural-networks

CIFAR-10

The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Here are the classes in the dataset, as well as 10 random images from each: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

To learn more about CIFAR data or download data, click the Source link.

Source : https://www.cs.toronto.edu/~kriz/cifar.html

Support / Contributing

If you want to propose any improvements or need any help, feel free to contribute by opening an issue.

References

Learning Multiple Layers of Features from Tiny Images by Alex Krizhevsky
Download CIFAR datasets by Alex Krizhevsky
Tutorial to Image Classification by Quentin Gallouédec
What is the k-nearest neighbors method by IBM
What are Neural Networks by IBM

License

This project is licensed under the MIT License.

Author

Alexandre Colas

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
assets		assets
modules		modules
results		results
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Classification

Installation

Requirements

Development Environment

Usage

Architecture

Principles

k-Nearest Neighbors Algorithm

Neural Networks

CIFAR-10

Support / Contributing

References

License

Author

About

Uh oh!

Releases

Packages

Languages

License

AlecColas/image-classification

Folders and files

Latest commit

History

Repository files navigation

Image Classification

Installation

Requirements

Development Environment

Usage

Architecture

Principles

k-Nearest Neighbors Algorithm

Neural Networks

CIFAR-10

Support / Contributing

References

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages