Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
-
Updated
Jul 5, 2024
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
A multimodal agent framework for solving complex tasks
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds
Implementation for the different ML tasks on Kaggle platform with GPUs.
autoupdate paper list
Seamlessly integrate state-of-the-art transformer models into robotics stacks
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
[ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Turn your screen into actions (using LLMs). Inspired by adept.ai, rewind.ai, Apple Shortcut. Rust.
Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Build real-time multimodal AI applications 🤖🎙️📹
日本語LLMまとめ - Overview of Japanese LLMs
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."