A multimodal expert assistant GPT platform built using RAG+agent. It integrates tools for modalities such as text, images, and audio. Support local deployment and private database construction.
project_display.mp4
1
Basic Function
- Single/multi turn chat
- Multimodal information display and interaction
- Agent
- Tools
- Web searching
- Image generation
- Image caption
- audio-to-text
- text-to-audio
- Video caption
- RAG
- Private database
- Offline deployment
2
Supporting Information Modality
- text
- image
- audio
- video
3
Model Interface API
- ChatGPT
- Dalle
- Google-Search
- BLIP
Project technology stack: Python + torch + langchain + gradio
- Create a virtual environment in Anaconda:
conda create -n agent python=3.10
- Enter the virtual environment and Install related dependency packages:
conda activate agent
pip install -r ./requirements.txt
-
Install the BLIP model locally, open the BLIP website, and download all files to
Models/BLIP
. -
Follow the prompts to configure the key for the API that needs to be used in the
.env
.
Multi Agent GPT provides UI interface interaction, allowing users to launch agents and achieve intelligent conversations by running the web.py
:
python ./web.py
The program will run a local URL: http://XXX. Open using a local browser to see the UI interface:
By integrating the BLIP model, agents can understand image information and provide high-quality dialogue information.
- .env
- Agents/
- openai_agents.py #用来定义基于gpt3.5的agent
- Database/
- Docs/
- Imgs/
- Show/ #存储一些示例图片
- Models
- BLIP #图像理解大模型
- Tools/
- ImageCaption.py #基于BLIP的图像理解工具
- ImageGeneration.py #定义了一个基于openai dalle的文本生成图像的工具
- search.py #基于Google-search的联网搜索工具
- Utils/
- data_io.py
- stdio.py #实现了如何截获当前程序的日志信息,主要是用来获取agent的verbose信息
- utils_image.py #关于图像处理的一些功能函数
- utils_json.py #从已有的log日志信息中提取相关的有用字段(服务stdio)
- python_new_funciton.py #开发过程中的测试文件
- readme.md
- requirements.txt
- web.py #主运行文件