IDA-VLM

This is the code base for IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model.

We propose visual instruction tuning with ID reference, which unleashes the potential of LVLM in identity memory and recognition across diverse scenes, and develop an ID-aware LVLM, IDA-VLM. This paper paves the way for future artificial intelligence systems to possess multi-identity visual inputs, thereby facilitating the comprehension of complex visual narratives like movies.

Samples:

Animation image urls

https://img1.doubanio.com/view/photo/l/public/p2625512480.webp, https://img1.doubanio.com/view/photo/m/public/p2901199610.webp, https://img2.doubanio.com/view/photo/m/public/p2896107391.webp, https://img2.doubanio.com/view/photo/l/public/p2895851711.webp, https://olimg.3dmgame.com/uploads/images/xiaz/2021/0924/1632447816995.jpg, https://i0.hdslb.com/bfs/archive/0384c2f5139013b1ceae84395bbd58fae25898ef.jpg, https://act-webstatic.mihoyo.com/event-static/2023/08/15/9797cacf6d60a54f91fb6f68546b43e1_6723404097102093983.jpg?x-oss-process=image/quality,Q_80/resize,m_lfit,s_700

Todo list:

Release code.
Release benchmark images, tuning data.
Release model weights and easy start.

We have three main contributions: MM-ID, tuning data construction and model training.

In MM-ID, we introduce the task format and evaluation methods. ID_reference_data contains the processing code for producing instruction tuning data. Model includes training and inference code, which is based on Qwen-VL-Chat.

For a quickstart, you need download images of MM-ID (or prepare ID images and test images of your own) and model weights, to complete instruction task with ID inference, detailed in Model.

License

The majority of this project is licensed under Qwen-VL License.

Acknowledge

Qwen-VL: The codebase we build upon.
MovieNet: The main dataset we use for tuning data construction.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ID_reference_data		ID_reference_data
fig		fig
mm-id		mm-id
model		model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDA-VLM

License

Acknowledge

About

Releases

Packages

Contributors 2

Languages

jiyt17/IDA-VLM

Folders and files

Latest commit

History

Repository files navigation

IDA-VLM

License

Acknowledge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages