Skip to content

In this project, I'm using TensorFlow/Keras to develop a deep learning CNN to classify images of damselflies and dragonflies, reconstructing images of dragonflies.

Notifications You must be signed in to change notification settings

Gal-Gilor/The_Linnaeus_Bot

Repository files navigation

The_Linnaeus_Bot

Introduction

This project utilizes deep learning neural networks to classify images of damselflies and dragonflies and to generate images through image de-noising techniques (auto-encoders).

For those interested in a shorter recap: Presentation Slides

Table of Contents

Technical Description

Process

For this project, I used part of a google competition dataset; iNat Challange 2019. The dataset contained 8462 damselfly images (1.72 GB) and 9197 dragonfly images (1.76 GB). All the images were resized to have a maximum dimension of 800 pixels and saved as JPEG. I then process the images and train a convolutional neural network (CNN) to distinguish between a dragonfly and a damselfly. Additionally, I experimented with image de-noising techniques using CNN's to generate images of dragonflies for classification purposes.

Python libraries

Data and EDA

The original dataset contained 82 GB of images for various living organisms. Due to time constraint, I focused on the 8462 damselfly, and 9197 dragonfly images.

Classes pie chart

As part of the image processing stage, I resize every image to 256 by 256 pixels, grayscale, convert the image to a Numpy array, and normalized the pixel values by dividing every pixel by 255. Additionally, I augmented the data and created the mirror image to doubled the number of available images.

Final image count

Creating the Test Set

After preparing the images for analysis I had 16924 (215 MB) damselfly images, and 18394 (237 MB) dragonfly images. The training set comprised of 12694 damselfly and 13797 dragonfly images (26491‬ images; 75% of the data). The test set comprised of 4230 damselfly and 4597 dragonfly images (8827‬ images; 25% of the data). Due to limited computational power, I saved the training and testing sets for damselflies and dragonflies separately in 4 different .npy files.

Supervised model

Model Architecture

CNN Architecture

To complete the task of training the CNN, batching the data was necessary. Every batch consists of 4000 images, of which 5% reserved for validation purposes.

  • Train on 4000 Images
  • Save Weights
  • Clear Cache
  • Reload Weights
  • Retrain on 4000 New Images
  • Repeat 6 Times (24000 images total)

Training accuracy history Training loss history

After training the model on 24000 images the model achieves 85%~ accuracy on the testing set (8827‬ images)

Confusion matrix

Due to the large volumes of data, I was unable to ensure the image quality. Some of the images were blurry, some with dominant background noise, and some contained more than 1 animal. Thus, making the training and classification harder.

Classified Correctly

Original

Dragonfly

What the Model Sees - First CNN Layer

what the models sees at the first layer

What the Model Sees - Second CNN Layer

what the models sees at the second layer

What the Model Sees - Third CNN Layer

what the models sees at the forth layer

What the Model Sees - Forth CNN Layer

what the models sees at the fifth layer

Misclassified

Original

Dragonfly

What the Model Sees - First CNN Layer

what the models sees at the first layer

What the Model Sees - Second CNN Layer

what the models sees at the second layer

What the Model Sees - Third CNN Layer

what the models sees at the forth layer

What the Model Sees - Forth CNN Layer

what the models sees at the fifth layer

Unsupervised models

Model Architecture

Autoencoder Architecture

Convolutional Autoencoder - Noise Reduction Technique

There's still a lot to learn about unsupervised neural networks. In this experiment, I train a convolutional autoencoder on 18394 dragonfly images, of which 2394 images I reserve for validation purposes. Because of the large volumes of data, training the autoencoder model on a regular local device is a slow process. Thus, it's still a work in process. However, using said de-noising technique, I manage to generate low quality (after 15 epochs) dragonfly images. Additionally, as part of the experiment, I attempt to classify said generated images.

Original image Original image

Generated image Generated image

I then attempted to classify the 2394 dragonfly images using the pre-trained classification model. Surprisingly, the model classified correctly only 825 images (34%), less than the random chance for a binary choice (50%). Upon careful examination, I couldn't find errors such as wrong labeling in the code that could explain the outcome. Further inquiry is required.

Confusion matrix

Future Improvements

  1. Improve Image Preprocessing

    • Given this my first time working with image data and despite the time constraints I am happy with the classification results. However, I believe additional data preprocessing such as removing background noise and focusing on the insect before image resizing would improve classification results.
  2. Improve Unsupervised Model

    • Examining the change in loss throughout the training,it is evident a deeper neural network autoencoder, and additional training would increase image quality. Also, due to the fact, my local machine was unable able to provide adequate resources for training the model, I consider outsourcing the training process to external services such as AWS's SageMaker.
  3. Combine the two models and create a Generative Neural Network (GAN) model

    • I would love to see how well the unsupervised can reconstruct the dragonflies images. Also, test whether the classifier can distingush between computer generated reconstructions, and real images

Update

I decided to update this model to include more out-of-the-box tools to develop the image classification model and training it using my GPU (GeForce GTX 1050).

Instead of using my functions to augment and load the images, I decided to integrate ImageDataGenerator and flow_from_directory into my data processing pipeline.

Even though I reduced the model's complexity and trained for fewer epochs, the new image classifier achieves 91%~ accuracy on the test set (6% increase).

I don't know when I'll continue improving the autoencoder and connecting the two models, creating a GAN model as, at the moment, I'm focusing on learning PyTorch.

About

In this project, I'm using TensorFlow/Keras to develop a deep learning CNN to classify images of damselflies and dragonflies, reconstructing images of dragonflies.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published