This is a collection of foundational projects for anyone diving into computer vision.
Explore some of computer vision core concepts and hands-on projects through this fun challenge.
The project has 3 levels:
- Level 0 - Zero (beginner): Getting Started with Basics
- Level 1 - Apprentice (intermediate): Hands-on Computer Vision with Deep Learning
- Level 2 - Hero (advanced): Vision LLMs: Image Generation(GANs, VAEs...), Synthesis & Captioning
Important
In L1 and L2, we primarily leverage pre-trained models to ensure accessibility for everyone. This also allows us to explore a wider range of vision recognition tasks using different types of models while focusing on the model's performance and outcome.
graph LR
A[Image Acquisition] ==> B[Image Processing]
B ==> C[Feature Extraction]
C ==> D[Output, Interpretation & Analysis]
style A fill:#EEE,stroke:#333,stroke-width:4px
style B fill:#F88,stroke:#333,stroke-width:4px
style C fill:#4F4,stroke:#333,stroke-width:4px
style D fill:#33F,stroke:#333,stroke-width:4px
To install the dependency packages using either conda
or pip
:
Using conda:
- create a new conda environment
conda create --name cv-challenge
- Activate the newly created environment:
source activate cv-challenge # For bash/zsh
conda activate cv-challenge # For conda prompt/powershell
- Install dependencies from the requirements.txt file:
conda install --channel conda-forge --file requirements.txt
Using pip:
- Install dependencies from the requirements.txt file:
pip install -r requirements.txt
Project | Description | Notebooks | |
---|---|---|---|
[1] | Getting Stated with Images | Load an image, display it, and apply basic transformations. | |
[2] | Basic Image Manipulation | Modify pixels, resizing, Flipping, Cropping, image annotations | |
[3] | Image Filtering & Restoration | Enhance or manipulate image features using filtering techniques. | |
[4] | Image Enhancement | Enhance using arithmetic & bitwise operations | |
[5] | Image Segmentation (Traditional) | segment images into regions or pixels that belong to different classes or categories | |
[6] | Feature Extraction & Alignment | Learn how to extract features from images using descriptors based on the nature of the features | |
[7] | Optical Character Recognition (OCR) | Learn how to recognize text in images or documents using libraries such as Tesseract, Pytesseract, or EasyOCR |
Project | Description | Notebooks | |
---|---|---|---|
[1] | MNIST Handwritten Digit Recognition | Train a simple neural network to classify handwritten digits from the MNIST dataset. | |
[2] | CIFAR-10 Image Classification | Utilize convolutional neural networks (CNNs) to classify images of different types of objects from the CIFAR-10 dataset. | |
[3] | Object Detection with YOLOv5 | Implement YOLOv5, a real-time object detection algorithm, to detect objects in images and videos. | |
[4] | Semantic Segmentation with DeepLabv3+ | Utilize DeepLabv3+, a semantic segmentation model, to segment images into different semantic categories. | |
[5] | Facial Recognition with OpenFace | Explore facial recognition using OpenFace, a facial recognition library, to identify individuals in images. | |
[6] | Object Tracking | Follow the movement of objects in a video sequence. | |
[7] | Human Pose Estimation | Estimate the pose of a person in an image or a video using OpenCV and a pre-trained model. |
Project | Description | Notebooks | |
---|---|---|---|
[1] | Creative Image Generation with GANs | Generate novel images of different styles using GANs. | |
[2] | Text-to-Image Synthesis with LLMs and Diffusion Models | Create realistic and creative images from text descriptions using LLMs and diffusion models. | |
[3] | AI-Powered Image Restoration and Enhancement | Restore and enhance images using AI methods. | |
[4] | Style Transfer with GANs and Image Processing | Transfer the artistic style of one image to another. | |
[5] | AI-Driven Image Captioning and Storytelling | Generate comprehensive and creative captions and stories from images using LLMs. | |
[6] | AI-Assisted Image Editing and Manipulation | Automate image editing and manipulation tasks using AI. | |
[7] | AI-Powered Image Analysis and Classification | Analyze and classify images using AI models |
Most projects are written in Jupyter notebooks, you can run the directly using jupyter notebook/lab
or Colab
.
For projects with a main.py
file, run the command below:
python main.py
Help this project grow! Add new projects, improve existing ones and fix issues.
Please follow these steps to contribute:
- Fork this repository and clone it to your local machine.
- Create a new branch with a descriptive name for your contribution.
- Add your code and files to the branch and commit your changes.
- Push your branch to your forked repository and create a pull request to the main repository.
- Wait for your pull request to be reviewed and merged.
This project is licensed under the MIT LICENSE.
Some of the projects in this repository are inspired by or based on the following sources:
- Computer Vision OpenCV Python Free Course Udemy
- Computer Vision Free Course - Kaggle
- Visual Perception for Self-Driving Cars - University of Toronto
- The Complete Self-Driving Car Course - Udemy
- Top Computer Vision Projects (2023) - GeeksforGeeks
- 15 Computer Visions Projects You Can Do Right Now - neptune.ai
- 30+ Unique Computer Vision Projects with Source Code β 2023
- 7+ Computer Vision Projects on GitHub with Source Code 2024 - Omdena
- 20+ Computer Vision Projects Ideas for Beginners in 2023.
- A Dive into Vision-Language Models
- Advances in Visual Pretraining for LLMS | Neil Houlsby
- VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
- LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions
- Transforming Computer Vision with LLMs - Data Science Dojo