vlm

Here are 154 public repositories matching this topic...

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava deepseek-llm deepseek llama3 llama3-1 deepseek-v3 deepseek-r1 deepseek-r1-zero

Updated Apr 5, 2025
Python

CVHub520 / X-AnyLabeling

Sponsor

Star

Effortless data labeling with AI support from Segment Anything and other awesome models.

Updated Apr 5, 2025
Python

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

audio sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Mar 6, 2025
Python

om-ai-lab / VLM-R1

Star

Solve Visual Understanding with Reinforced VLMs

reinforcement-learning vlm multimodal llm qwen deepseek-r1 grpo r1-zero vlm-r1 multimodal-r1

Updated Apr 3, 2025
Python

joanrod / star-vector

Star

StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.

svg vlm llm multimodal-large-language-models

Updated Mar 26, 2025
Python

MiniMax-AI / MiniMax-01

Star

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention

vlm large-language-models llm llms vision-language-model minimax-text-01 minimax-vl-01

Updated Apr 2, 2025
Python

om-ai-lab / OmAgent

Star

Build multimodal language agents for fast prototype and production

Updated Mar 19, 2025
Python

QiuYannnn / Local-File-Organizer

Star

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

Updated Oct 21, 2024
Python

BAAI-Agents / Cradle

Star

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Apr 2, 2025
Python

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG

Updated Mar 30, 2025
Python

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

THUDM / CogAgent

Star

An open-sourced end-to-end VLM-based GUI Agent

agent glm vlm computer-use gui-agent

Updated Apr 4, 2025
Python

modelscope / evalscope

Star

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking

performance evaluation vlm rag llm

Updated Apr 3, 2025
Python

mbzuai-oryx / GeoChat

Star

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

remote-sensing vlm

Updated Nov 28, 2024
Python

vlm-run / vlmrun-hub

Star

A hub for various industry-specific schemas to be used with VLMs.

json ai computer-vision etl vlm multimodal pydantic pydantic-models genai vlm-ocr

Updated Apr 3, 2025
Python

Flame-Code-VLM / Flame-Code-VLM

Star

Flame is an open-source multimodal AI system designed to translate UI design mockups into high-quality React code. It leverages vision-language modeling, automated data synthesis, and structured training workflows to bridge the gap between design and front-end development.

react open-source front-end ai vue deep-learning frontend code-generation image-to-text vlm frontend-development multimodal data-synthesis design-to-code llm vision-language-model deepseek image-to-code screen-to-code

Updated Mar 26, 2025
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Feb 13, 2025
Python

niuzaisheng / ScreenAgent

Star

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

agent ai vlm llm

Updated Nov 25, 2024
Python

fpgaminer / joycaption

Sponsor

Star

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

vlm captioning joycaption

Updated Nov 29, 2024
Python

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm

Here are 154 public repositories matching this topic...

sgl-project / sglang

CVHub520 / X-AnyLabeling

NexaAI / nexa-sdk

om-ai-lab / VLM-R1

joanrod / star-vector

MiniMax-AI / MiniMax-01

om-ai-lab / OmAgent

QiuYannnn / Local-File-Organizer

BAAI-Agents / Cradle

xlang-ai / OSWorld

heshengtao / comfyui_LLM_party

BAAI-DCAI / Bunny

THUDM / CogAgent

modelscope / evalscope

mbzuai-oryx / GeoChat

vlm-run / vlmrun-hub

Flame-Code-VLM / Flame-Code-VLM

gokayfem / ComfyUI_VLM_nodes

niuzaisheng / ScreenAgent

fpgaminer / joycaption

Improve this page

Add this topic to your repo