[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

kcz358 · 2024-04-15T02:21:43Z

Hi @lewtun , this is our blog for the lmms-eval. Could you help us check the article and see whether there are something that can be added for example user experience or how to add a new model? Also, you might also want to add your names in the author list.

Thank you!

This blog introduces a new evaluation pipeline for large vision language model. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry.

lewtun

Thank you very much for this blog post! I left a few minor suggestions and a pointer to include the details in _blog.yml

lewtun · 2024-04-19T15:37:35Z

lmms_eval.md

@@ -0,0 +1,85 @@
+---
+title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
+thumbnail: https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/lmms-eval-header.png


I believe this should live in the blog repo directly to render on hf.co/blog. See here for an example: https://github.com/huggingface/blog/pull/2021/files#diff-a332b83464cf2b650715bacb6e3f07b994af0790acc88a4ea353883ba2ae751eR3853

Note you also need to add the blog details to _blog.yml

Thank you! I have also noticed that in the _blog.yml, we can only have one author on the list?

Yes, that's just for the thumbnail, but the blog post itself will show all authors:

lmms_eval.md

lewtun · 2024-04-19T15:43:03Z

lmms_eval.md

+**One-click evaluation**: lmms-eval allows users to easily evaluate their model performance on multiple datasets with a single command, without the need for manual dataset preparation. With just one line of code, users can obtain comprehensive evaluation results within minutes, including detailed logs and sample analysis covering model parameters, inputs and outputs, correct answers, etc. This is suitable for scenarios where advanced models like GPT4 are needed for scoring.
+
+```
+accelerate launch --num_processes=8 -m lmms_eval --model llava   --model_args pretrained="liuhaotian/llava-v1.5-7b"   --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs


Suggested change

accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained="liuhaotian/llava-v1.5-7b" --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs

# pip install git+https://github.com/huggingface/lmms-eval.git

accelerate launch --multi_gpu --num_processes=8 -m lmms_eval \

--model llava \

--model_args pretrained="liuhaotian/llava-v1.5-7b" \

--tasks mme,mmbench_en \

--batch_size 1 \

--log_samples \

--log_samples_suffix llava_v1.5_mme_mmbenchen \

--output_path ./logs

I think I will change the link to our current repo since hf forked repo is kind of behind and I will also add pip install git+https://github.com/haotian-liu/LLaVA.git

lmms_eval.md

lewtun · 2024-04-19T15:45:10Z

lmms_eval.md

+
+Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.
+
+To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.


Suggested change

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.

lmms_eval.md

Co-authored-by: lewtun <[email protected]>

kcz358 · 2024-04-20T05:48:13Z

Hi @lewtun , thank you for your feedback.

I have uploaded the thumbnail picture and fixed several problems in the blog. Could you help us check if there are any more problems to fix in this article?

When we finalize the English version of the article, we will also help to translate everything into Chinese and put it into /blog/zh

Thank you!

lewtun

Thanks for iterating @kcz358 ! This all looks good to me and gently pinging @pcuenca for final approval

Context: this is a blog post about an open source lib for evaluating multimodal models that the TRL team contributed to and it what we recommend in the TRL examples.

pcuenca

Very interesting!

also cc @merveenoyan for info.

pcuenca · 2024-04-24T14:32:37Z

_blog.yml

+  title: "Unified multimodal large model evaluation, accelerating multimodal intelligence emergence"
+  author: kcz358
+  thumbnail: /blog/assets/lmms_eval/thumbnail.png
+  date: April 20, 2024


Reminder to update date before release :)

(Also I'd move the entry to the end of the file, just in case)

lmms_eval.md

pcuenca · 2024-04-24T14:57:34Z

lmms_eval.md

+
+**Synchronized Online Logging**: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient.
+
+<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" />


I don't think these links will be embedded correctly as images (they are references to the github tree)

Hi I try to change the src to a link on huggingface dataset repo but I can't see the rendered image on the github. May I ask what is the most proper way to put image link in the blog?

I have uploaded all the images here but unable to find a way to let github markdown render the image

pcuenca · 2024-04-24T14:57:49Z

lmms_eval.md

+
+<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/org_dataset.png" alt="dataset on organization"/>
+
+<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/viewer.png"  alt="viewer" />


Same comment about the image link.

lmms_eval.md

merveenoyan · 2024-04-24T15:47:17Z

thanks a lot for the blog post! I'll give this a spin 😊

merveenoyan

mostly nits 😊

merveenoyan · 2024-04-24T15:48:03Z

lmms_eval.md

+- user: liuziwei7
+  guest: true
+---
+# Unified multimodal large model evaluation, accelerating multimodal intelligence emergence


we can make it uppercase for h1 IMO

merveenoyan · 2024-04-24T15:49:11Z

lmms_eval.md

+
+Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.
+
+To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced `lmms-eval`, which is an evaluation framework designed specifically for multimodal large models. Building upon EleutherAI's [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness) and [🤗 Accelerate](https://github.com/huggingface/accelerate), this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating large multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.


would be nice to directly give a link to lmms-eval instead of putting it in code formatting

merveenoyan · 2024-04-24T15:49:35Z

lmms_eval.md

+
+<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/>
+
+## Overview of the main features


again maybe uppercase main and features

lmms_eval.md

Co-authored-by: Pedro Cuenca <[email protected]>

Co-authored-by: Merve Noyan <[email protected]>

Co-authored-by: Pedro Cuenca <[email protected]>

kcz358 · 2024-04-25T13:24:20Z

Hi @pcuenca @merveenoyan , thank you for your kind feedback.

I have tried to fix most of the issue in the comments and the image source issue. May I kindly ask for a review for this version and I will try to update the date in _blog.yml before release.

lewtun · 2024-05-01T14:24:21Z

Thanks for iterating @kcz358 ! Would you mind resolving the merge conflicts and then we should be pretty good to go!

kcz358 · 2024-05-02T05:00:53Z

Hi @lewtun , I have merged the main branch and added the Chinese version of the blog. I have also updated the date in _blog.yml

kcz358 · 2024-05-16T06:46:37Z

Hi @lewtun , sorry for pinning you again. Do you think we are able to merge for current version?

pcuenca · 2024-05-16T07:49:01Z

_blog.yml

+- local: sc2-instruct
+  title: "StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation"
+  thumbnail: /blog/assets/sc2-instruct/sc2-instruct-banner.png
+  author: yuxiang630
+  guest: true
+  date: Apr 29, 2024
+  tags:
+    - nlp
+    - community
+    - research
+    - LLM
+
+- local: evaluation-structured-outputs
+  title: "Improving Prompt Consistency with Structured Generations"
+  author: willkurt
+  guest: true
+  thumbnail: /blog/assets/evaluating-mmlu-leaderboard/thumbnail.png
+  date: Apr 30, 2024
+  tags:
+    - evaluation
+    - collaboration
+    - research
+    - leaderboard
+
+- local: asr-diarization
+  title: "Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints"
+  author: sergeipetrov
+  thumbnail: /blog/assets/asr-diarization/thumbnail.png
+  date: May 1, 2024
+  tags:
+    - audio
+    - asr
+    - inference
+


hmmm these entries shouldn't be here. Can you try to merge main again and ensure there are no duplicates?

Thank you for spotting out the issue! I have merge the main again and delete the duplicates.

Co-authored-by: Pedro Cuenca <[email protected]>

lewtun · 2024-05-30T14:21:24Z

@pcuenca I resolved the merge conflicts - ok if we merge this? (Feel free to do so if you agree)

kcz358 and others added 9 commits April 14, 2024 13:04

Initial commit

b2a5ff0

Try to add image

b5ed7fc

See whether it works using huggingface dataset

929c502

Nah

37b377f

Add english version

809268e

Update lmms_eval.md

28c44ae

Add author list

44cef2f

Merge branch 'main' of https://github.com/kcz358/blog

bfda318

Revise author list

bb4f141

lewtun reviewed Apr 19, 2024

View reviewed changes

kcz358 and others added 11 commits April 20, 2024 13:06

Update lmms_eval.md

6507e4f

Co-authored-by: lewtun <[email protected]>

Update lmms_eval.md

1aef8f8

Co-authored-by: lewtun <[email protected]>

Update lmms_eval.md

2fdda3f

Co-authored-by: lewtun <[email protected]>

Update lmms_eval.md

6aadd54

Co-authored-by: lewtun <[email protected]>

Update lmms_eval.md

2656012

Co-authored-by: lewtun <[email protected]>

Update lmms_eval.md

21fa476

Update lmms_eval in _blog.yml

fb5a9c8

Add thumbnail image to assets

f1f8604

Update lmms_eval.md

5a3f283

Update lmms_eval.md

f04f8ca

Merge branch 'main' into main

2974a3d

kcz358 requested a review from lewtun April 24, 2024 04:19

lewtun approved these changes Apr 24, 2024

View reviewed changes

pcuenca approved these changes Apr 24, 2024

View reviewed changes

merveenoyan approved these changes Apr 24, 2024

View reviewed changes

kcz358 and others added 3 commits April 25, 2024 10:23

Update lmms_eval.md

18e888f

Co-authored-by: Pedro Cuenca <[email protected]>

Update lmms_eval.md

549c968

Co-authored-by: Merve Noyan <[email protected]>

Update lmms_eval.md

f288278

Co-authored-by: Pedro Cuenca <[email protected]>

kcz358 and others added 9 commits April 25, 2024 11:13

Change image src

62db0ce

Switch back to github link for image

74fa630

Update image src

d334208

Add link to lmms-eval

9cfde7d

Fix title issue

b8b6aef

Fix upper title

90a66f2

Merge branch 'main' of https://github.com/huggingface/blog

3229476

Add images

d3486c4

Update lmms_eval.md

782e690

kcz358 requested review from pcuenca and merveenoyan April 29, 2024 04:32

kcz358 added 4 commits May 2, 2024 12:38

Merge remote-tracking branch 'upstream/main'

df74c0a

Merge branch 'main' of https://github.com/kcz358/blog

6dc20a5

Add chinese version

5454271

Update dates

051df69

kcz358 added 2 commits May 8, 2024 16:13

Merge remote-tracking branch 'upstream/main'

49ecbac

Merge remote-tracking branch 'upstream/main'

12099b7

pcuenca reviewed May 16, 2024

View reviewed changes

kcz358 and others added 7 commits May 16, 2024 16:02

Update lmms_eval.md

9636095

Co-authored-by: Pedro Cuenca <[email protected]>

Update lmms_eval.md

dda7b65

Co-authored-by: Pedro Cuenca <[email protected]>

Update lmms_eval.md

03cc232

Co-authored-by: Pedro Cuenca <[email protected]>

Merge remote-tracking branch 'upstream/main'

74220b5

Remove duplicate

cd70bc6

Add resources at the end of the blog

6e223b2

Merge branch 'main' into main

d020514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

kcz358 commented Apr 15, 2024

lewtun left a comment

lewtun Apr 19, 2024

kcz358 Apr 20, 2024

lewtun Apr 24, 2024

lewtun Apr 19, 2024

kcz358 Apr 20, 2024

lewtun Apr 19, 2024

kcz358 commented Apr 20, 2024

lewtun left a comment

pcuenca left a comment

pcuenca Apr 24, 2024

pcuenca Apr 24, 2024

pcuenca Apr 24, 2024

kcz358 Apr 25, 2024

pcuenca Apr 24, 2024

merveenoyan commented Apr 24, 2024

merveenoyan left a comment

merveenoyan Apr 24, 2024

merveenoyan Apr 24, 2024

merveenoyan Apr 24, 2024

kcz358 commented Apr 25, 2024 •

edited

Loading

lewtun commented May 1, 2024

kcz358 commented May 2, 2024

kcz358 commented May 16, 2024

pcuenca May 16, 2024

kcz358 May 16, 2024

lewtun commented May 30, 2024

-accelerate launch --num_processes=8 -m lmms_eval --model llava   --model_args pretrained="liuhaotian/llava-v1.5-7b"   --tasks mme,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_v1.5_mme_mmbenchen --output_path ./logs
+# pip install git+https://github.com/huggingface/lmms-eval.git
+accelerate launch --multi_gpu --num_processes=8 -m lmms_eval \
+    --model llava   \
+    --model_args pretrained="liuhaotian/llava-v1.5-7b"   \
+    --tasks mme,mmbench_en \
+    --batch_size 1 \
+    --log_samples \
+    --log_samples_suffix llava_v1.5_mme_mmbenchen \
+    --output_path ./logs


		Another challenge lies in data acquisition and processing during the evaluation process, especially when dealing with old datasets that are not widely available. Researchers often need to invest a considerable amount of time and effort in manual searching, downloading, and processing.

		To address these issues, researchers from Nanyang Technological University, ByteDance, and other institutions have jointly open-sourced lmms-eval, which is an evaluation framework designed specifically for multimodal large models. Building upon lm-evaluation-harness, this framework has been improved and expanded to provide a unified interface for defining models, datasets, and evaluation metrics, offering a one-stop, efficient solution for evaluating multimodal models (LMMs). We hope that through this framework, we can collectively drive the iteration cycle of multimodal models and promote their broader application in academia and industry. We sincerely look forward to witnessing more breakthroughs and innovations in the field of multimodal AI, jointly advancing towards a more efficient and intelligent future development of artificial intelligence technology.


		Synchronized Online Logging: We provide detailed logging tools to help you understand the evaluation process and results. Logs include model parameters, generation parameters, input questions, model responses, and ground truth answers. You can record every detail and visualize it in Weights & Biases runs. Users can access results in real-time from anywhere, making it convenient and efficient.

		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/wandb_table.jpg" alt="wandb_table" />


		<image src="https://github.com/lmms-lab/lmms-eval-blog/blob/master/assets/img/teaser.png" alt="Pipeline"/>

		## Overview of the main features

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

Are you sure you want to change the base?

[New blog post] Unified multimodal large model evaluation, accelerating multimodal intelligence emergence #1987

Conversation

kcz358 commented Apr 15, 2024

lewtun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcz358 commented Apr 20, 2024

lewtun left a comment

Choose a reason for hiding this comment

pcuenca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merveenoyan commented Apr 24, 2024

merveenoyan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcz358 commented Apr 25, 2024 • edited Loading

lewtun commented May 1, 2024

kcz358 commented May 2, 2024

kcz358 commented May 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun commented May 30, 2024

kcz358 commented Apr 25, 2024 •

edited

Loading