vision-transformer

Star

Here are 894 public repositories matching this topic...

open-mmlab / mmdetection

Star

OpenMMLab Detection Toolbox and Benchmark

Updated Aug 21, 2024
Python

lukas-blecher / LaTeX-OCR

Star

pix2tex: Using a ViT to convert images of equations into LaTeX code.

python machine-learning ocr latex deep-learning image-processing pytorch dataset transformer vit image2text im2text im2latex im2markup math-ocr vision-transformer latex-ocr

Updated Jan 18, 2025
Python

NielsRogge / Transformers-Tutorials

Star

This repository contains demos I made with the Transformers library by HuggingFace.

transformers pytorch bert gpt-2 layoutlm vision-transformer

Updated Jul 2, 2025
Jupyter Notebook

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

transformers generative-model image-generation auto-regressive-model gpt neurips gpt-2 diffusion-models autoregressive-models vision-transformer large-language-models generative-ai

Updated May 18, 2025
Jupyter Notebook

adithya-s-k / omniparse

Sponsor

Star

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ocr parser-library web-crawler parse-server whisper-api ingestion-api vision-transformer omniparser

Updated Jun 11, 2025
Python

JingyunLiang / SwinIR

Star

SwinIR: Image Restoration Using Swin Transformer (official repository)

decompression transformer super-resolution image-denoising image-restoration restoration denoising image-super-resolution low-level-vision deblocking vision-transformer image-deblocking compression-artifact-reduction real-world-image-super-resolution lightweight-image-super-resolution image-sr

Updated May 14, 2024
Python

huawei-noah / Efficient-AI-Backbones

Star

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

tensorflow pytorch transformer imagenet convolutional-neural-networks pretrained-models model-compression efficient-inference ghostnet vision-transformer

Updated Mar 15, 2025
Python

open-mmlab / mmpretrain

Star

OpenMMLab Pre-training Toolbox and Benchmark

deep-learning pytorch image-classification resnet pretrained-models clip mae mobilenet moco multimodal self-supervised-learning constrastive-learning beit vision-transformer swin-transformer masked-image-modeling convnext

Updated Nov 1, 2024
Python

mit-han-lab / efficientvit

Star

Efficient vision foundation models for high-resolution generation and perception.

imagenet segmentation high-resolution vision-transformer efficientvit segment-anything deep-compression-autoencoder efficient-diffusion-model

Updated Sep 5, 2025
Python

InternLM / InternLM-XComposer

Star

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

foundation gpt language-model multimodal multi-modality vision-transformer gpt-4 visual-language-learning llm chatgpt instruction-tuning large-language-model supervised-finetuning mllm vision-language-model large-vision-language-model

Updated May 26, 2025
Python

baaivision / EVA

Star

EVA Series: Visual Representation Fantasies from BAAI

representation-learning vision-transformer foundation-models

Updated Aug 1, 2024
Python

OpenGVLab / InternVideo

Star

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Updated Aug 7, 2025
Python

hila-chefer / Transformer-Explainability

Star

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.