Computer Vision Challenge 🏆

Overview

This is a collection of foundational projects for anyone diving into computer vision.

Explore some of computer vision core concepts and hands-on projects through challenges.

Challenges are organized into levels:

Level 0 - Zero/beginner: Getting Started with Basics
Level 1 - Apprentice/intermediate: Hands-on Computer Vision with Deep Learning
Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More
Level 3 - Advanced: Video Models Benchmarking (ongoing)
Level 4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs (ongoing)
Level 5 - Master: Multimodality (ongoing)

Important

In L1 and L2, we primarily leverage pre-trained models to ensure accessibility for everyone. This also allows us to explore a wider range of vision recognition tasks using different types of models while focusing on the model's performance and outcome.

Basic Computer Vision Pipeline

graph LR
    A[Image Acquisition] ==> B[Image Processing]
    B ==> C[Feature Extraction]
    C ==> D[Output, Interpretation & Analysis]

    style A fill:#EEE,stroke:#333,stroke-width:4px
    style B fill:#F88,stroke:#333,stroke-width:4px
    style C fill:#4F4,stroke:#333,stroke-width:4px
    style D fill:#33F,stroke:#333,stroke-width:4px

Requirements

To install the dependency packages using either conda or pip:

Using conda:

create a new conda environment

conda create --name cv-challenge

Activate the newly created environment:

source activate cv-challenge  # For bash/zsh
conda activate cv-challenge  # For conda prompt/powershell

Install dependencies from the requirements.txt file:

conda install --channel conda-forge --file requirements.txt

Using pip:

Install dependencies from the requirements.txt file:

pip install -r requirements.txt

Hands-on Computer Vision Challenges!

Level 0 - Zero: Getting Started with Basics 💪

	Project	Description
[1]	Getting Started with Images	Load an image, display it, and apply basic transformations.
[2]	Basic Image Manipulation	Modify pixels, resizing, Flipping, Cropping, image annotations
[3]	Image Filtering & Restoration	Enhance or manipulate image features using filtering techniques.
[4]	Image Enhancement	Enhance using arithmetic & bitwise operations
[5]	Image Segmentation (Traditional)	segment images into regions or pixels that belong to different classes or categories
[6]	Feature Extraction & Alignment	Learn how to extract features from images using descriptors based on the nature of the features
[7]	Optical Character Recognition (OCR)	Learn how to recognize text in images or documents using libraries such as Tesseract, Pytesseract, or EasyOCR

Level 1 - Apprentice: Hands-on Computer Vision with Deep Learning 🔥

	Project	Description
[1]	MNIST Handwritten Digit Recognition	Train a simple neural network to classify handwritten digits from the MNIST dataset.
[2]	CIFAR-10 Image Classification	Utilize convolutional neural networks (CNNs) to classify images of different types of objects from the CIFAR-10 dataset.
[3]	Object Detection with YOLOv5	Implement YOLOv5, a real-time object detection algorithm, to detect objects in images and videos.
[4]	Semantic Segmentation with DeepLabv3+	Utilize DeepLabv3+, a semantic segmentation model, to segment images into different semantic categories.
[5]	Facial Recognition with OpenFace	Explore facial recognition using OpenFace, a facial recognition library, to identify individuals in images.
[6]	Object Tracking	Follow the movement of objects in a video sequence.
[7]	Human Pose Estimation	Estimate the pose of a person in an image or a video using OpenCV and a pre-trained model.

Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More ⚡

	Project	Description
[1]	Creative Image Generation with GANs	Generate novel images of different styles using GANs.
[2]	Text-to-Image Synthesis with LLMs and Diffusion Models	Create realistic and creative images from text descriptions using LLMs and diffusion models.
[3]	AI-Powered Image Restoration and Enhancement	Restore and enhance images using AI methods.
[4]	Style Transfer with GANs and Image Processing	Transfer the artistic style of one image to another.
[5]	AI-Driven Image Captioning and Storytelling	Generate comprehensive and creative captions and stories from images using LLMs.
[6]	AI-Assisted Image Editing and Manipulation	Automate image editing and manipulation tasks using AI.
[7]	AI Image Recognition Benchmarks with SOTA Vision Models	Benchmark SOTA Vision Models on a variety of image recognition tasks, including image classification, object detection, ...

Level 3 - Advanced: Video Models Benchmarking

	Project	Description
[1]	Video Generation & Captioning	Create realistic video content from text, generate descriptive text or subtitles for video content using AI models.
[2]	Facial Emotion Recognition	Automatically generate descriptive text or subtitles for video content using AI.
[3]	Motion Analysis	Analyze the motion and movement of objects in a video sequence techniques: tracking, optical flow, video detection, etc.
[4]	Video Segmentation	Divide video frames into meaningful segments or regions for analysis and processing.
[5]	Video Style Transfer	Apply artistic styles from one video or image to another video, transforming its visual appearance.
[6]	Video Restoration & Enhancement	Restore and enhance videos using AI methods.
[7]	Video Models Benchmarking	Benchmark SOTA Video Models on a variety of video recognition tasks, including video classification, object detection, etc.

Usage

Most projects are written in Jupyter notebooks, you can run the directly using jupyter notebook/lab or Colab.

For projects with a main.py file, run the command below:

python main.py

Roadmap & Upcoming Features

Roadmap:

    flowchart BT
        A(Level 0: Zero) --> B(Level 1: Intermediate)
        A --> C(Level 2: Hero)
        A --> D(Level 3: Advanced)
        A --> E(Level 4: Expert)
        A --> F(Level 5: Master)
        
        style A fill:#fff,stroke:#333,stroke-width:2px
        style B fill:#88f,stroke:#333,stroke-width:2px
        style C fill:#8f8,stroke:#333,stroke-width:2px
        style D fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
        style E fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5
        style F fill:#bbb,stroke:#f66,stroke-width:2px,color:#fff,stroke-dasharray: 5 5

New levels:

L3 - Advanced: Video Models Benchmarking
L4 - Expert: Finetuning of VLMs (Vision Language Models) & LVMs
L5 - Master: Multimodality

Upcoming Features:

Feature	Description	Status
Code Refactoring	Enhance code readability by cleaning, documenting, and integrating Gradio demos.	To-Do
New Learning Levels	Introduce advanced levels: L3 - Video Models Benchmarking, L4 - Finetuning of VLMs (Vision Language Models) & LVMs, and L5 - Multimodality	To-Do
Wiki Update	Document the new learning levels in the project Wiki.	To-Do
Multilingual Support	Translate the README.md file into multiple languages (French, Spanish, etc.).	To-Do
Edge Device Deployment	Explore code translation for deployment on edge devices using C++ or Rust.	To-Do
Performance Enhancements	Investigate options to improve performance, including adding new datasets and supporting additional computer vision tasks.	To-Do
Machine Learning Framework Integration	Integrate the project with popular machine learning frameworks.	To-Do

Contributing

We warmly welcome your contributions! Whether you're a seasoned developer or just starting out in Computer Vision, you can help us improve the project and make it more valuable to everyone.

How to contribute:

Fork this repository and clone it to your local machine.
Create a new branch with a descriptive name for your contribution.
Add your code and files to the branch and commit your changes.
Push your branch to your forked repository and create a pull request to the main repository.
Wait for your pull request to be reviewed and merged.

Sponsor this Project

Another way to get involved is by sponsoring the project.

Your support will help:

Provide computational resources (This is a GPU Poor Project!!!) to explore new frontiers in computer vision by training larger and more complex model
Keep the project up to date with the latest computer vision advancements
Create more detailed tutorials for users at all skill levels

LICENSE

This project is licensed under the MIT LICENSE.

Star History

"Vision is a picture of the future that produces passion." - Bill Hybels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Vision Challenge 🏆

Overview

Basic Computer Vision Pipeline

Requirements

Hands-on Computer Vision Challenges!

Level 0 - Zero: Getting Started with Basics 💪

Level 1 - Apprentice: Hands-on Computer Vision with Deep Learning 🔥

Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More ⚡

Level 3 - Advanced: Video Models Benchmarking

Usage

Roadmap & Upcoming Features

Contributing

Sponsor this Project

LICENSE

Star History

"Vision is a picture of the future that produces passion." - Bill Hybels

About

Releases

Sponsor this project

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github		.github
L0_01_Getting_Started_With_Images		L0_01_Getting_Started_With_Images
L0_02_Basic_Image_Manipulation		L0_02_Basic_Image_Manipulation
L0_03_Image_Filtering_in_OpenCV		L0_03_Image_Filtering_in_OpenCV
L0_04_Basic_Image_Enhancement_Using_Bitwise		L0_04_Basic_Image_Enhancement_Using_Bitwise
L0_05_Image_Segmentation_Traditional_CV		L0_05_Image_Segmentation_Traditional_CV
L0_06_Feature_Extraction_Alignment		L0_06_Feature_Extraction_Alignment
L0_07_Optical_Character_Recognition_OCR		L0_07_Optical_Character_Recognition_OCR
L1_01_MNIST_Handwritten_Digit_Recognition		L1_01_MNIST_Handwritten_Digit_Recognition
L1_02_CIFAR_10_Image_Classification		L1_02_CIFAR_10_Image_Classification
L1_03_Object_Detection_with_YOLOv_		L1_03_Object_Detection_with_YOLOv_
L1_04_Image_Segmentation_with_Deep_Learning		L1_04_Image_Segmentation_with_Deep_Learning
L1_05_Face_Detection_with_Deep_Learning		L1_05_Face_Detection_with_Deep_Learning
L1_06_2D_Object_Tracking		L1_06_2D_Object_Tracking
L1_07_Human_Pose_Estimation_with_Deep_Learning		L1_07_Human_Pose_Estimation_with_Deep_Learning
L2_01_Creative_Image_Generation_with_GANs		L2_01_Creative_Image_Generation_with_GANs
L2_02_Text_to_Image_Synthesis_with_LLMs_and_Large_Vision_Models		L2_02_Text_to_Image_Synthesis_with_LLMs_and_Large_Vision_Models
L2_03_AI_Powered_Image_Restoration_and_Enhancement		L2_03_AI_Powered_Image_Restoration_and_Enhancement
L2_04_Style_Transfer_with_GANs_and_Image_Processing		L2_04_Style_Transfer_with_GANs_and_Image_Processing
L2_05_AI_Driven_Image_Captioning_and_Storytelling		L2_05_AI_Driven_Image_Captioning_and_Storytelling
L2_06_AI_Assisted_Image_Editing_and_Manipulation		L2_06_AI_Assisted_Image_Editing_and_Manipulation
L2_07_SOTA_Vision_Foundation_Models_Benchmarking		L2_07_SOTA_Vision_Foundation_Models_Benchmarking
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

afondiel/computer-vision-challenge

Folders and files

Latest commit

History

Repository files navigation

Computer Vision Challenge 🏆

Overview

Basic Computer Vision Pipeline

Requirements

Hands-on Computer Vision Challenges!

Level 0 - Zero: Getting Started with Basics 💪

Level 1 - Apprentice: Hands-on Computer Vision with Deep Learning 🔥

Level 2 - Hero: Large Vision Models (LVMs) from Image Generation, Inpainting, & More ⚡

Level 3 - Advanced: Video Models Benchmarking

Usage

Roadmap & Upcoming Features

Contributing

Sponsor this Project

LICENSE

Star History

"Vision is a picture of the future that produces passion." - Bill Hybels

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages