multimodal

Star

Here are 716 public repositories matching this topic...

zchoi / Multi-Modal-Large-Language-Learning

Star

Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.

benchmark awesome multimodal pre-training foundation-models large-language-models large-multimodal-models

Updated Jul 5, 2024

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jul 5, 2024
Python

om-ai-lab / OmAgent

Star

A multimodal agent framework for solving complex tasks

agent multimodal large-language-models

Updated Jul 5, 2024
Python

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503

Updated Jul 5, 2024
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jul 5, 2024
Python

OpenRobotLab / PointLLM

Star

[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds

chatbot point-cloud llama representation-learning 3d multimodal vision-and-language gpt-4 foundation-models large-language-models objaverse pointllm

Updated Jul 5, 2024
Python

Aisuko / notebooks

Sponsor

Star

Implementation for the different ML tasks on Kaggle platform with GPUs.

natural-language-processing computer-vision neural-network accelerator transformers pytorch kaggle quantization visulization fine-tuning peft multimodal wandb renforcement-learning large-language-models

Updated Jul 5, 2024
Jupyter Notebook

isLinXu / paper-list

Star

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated Jul 5, 2024
Python

mbodiai / embodied-agents

Star

Seamlessly integrate state-of-the-art transformer models into robotics stacks

robotics artificial-intelligence transformer agents diffusion vlm multimodal embodied embodied-agent large-language-models llm generative-ai vision-language-model embodied-agents mbodi mbodiai

Updated Jul 4, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Updated Jul 4, 2024
Python

ExplainableML / Vision_by_Language

Star

[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"

image-retrieval multimodal llm

Updated Jul 4, 2024
Python

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Jul 4, 2024
TypeScript

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Jul 4, 2024
Rust

louis030195 / screen-pipe

Sponsor

Star

Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust.

machine-learning ai computer-vision ml vision multimodal llm

Updated Jul 4, 2024
Rust

batmanlab / Mammo-CLIP

Star

Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

breast-cancer-prediction clip mammogram rsna multimodal vision-and-language efficientnet vindr rsna-breast-cancer

Updated Jul 4, 2024
Python

vdyma / fern

Star

The Multimodal Magic 🌿

research experimental multimodal

Updated Jul 4, 2024
Python

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference