av-SALMONN

av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Button Specifications:

Clear All: clear chat history as well as all modality inputs. Please always use clear all before you want to upload or update any image, audio or video

Clear history: only clear chat history. The modality input will remain unchanged unless you click Clear All.

Submit: submit the text in the text box to get a response

Resubmit: clear the previous conversation turn and then submit the text in the text box

maximum length, top p and temperature have their own individual meanings

Examples mentioned in the paper are provided. Please feel free to start with those.

We provide the script for evaluating speech (LibriSpeech) and audio (AudioCaps) as single-modal tasks using Video-LLaMA. Please find codes in infer_batch.sh and video_llama/ We provide the generated results for LibriSpeech (librispeech.json and librispeech_finetuned.json for finetuning 50k steps on LibriSpeech) and AudioCaps (audiocaps.json)

Demo comparison between av-SALMONN and 2-stage systems

We perform a case study for the following video: Video file: 2stage_video.mp4

In the following three examples, the question required the model to associate the speech with the correct speaker in order to answer it. As a result, only av-SALMONN can answer it correctly, whereas the other two 2-stage systems (av-SALMONN without audio input but add ASR transcription, and Video-Llama with ASR transcription) can not answer it.

av-SALMONN
2-stage av-SALMONN without audio input + ASR transcription
2-stage Video-Llama + ASR transcription

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
video_llama		video_llama
2stage1.png		2stage1.png
2stage2.png		2stage2.png
2stage_audio.wav		2stage_audio.wav
2stage_video.mp4		2stage_video.mp4
2stage_videollama.jpeg		2stage_videollama.jpeg
README.md		README.md
audiocaps.json		audiocaps.json
infer_batch.py		infer_batch.py
librispeech.json		librispeech.json
librispeech_finetuned.json		librispeech_finetuned.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

av-SALMONN

Demo comparison between av-SALMONN and 2-stage systems

About

Releases

Packages

Languages

the-anonymous-bs/av-SALMONN

Folders and files

Latest commit

History

Repository files navigation

av-SALMONN

Demo comparison between av-SALMONN and 2-stage systems

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages