Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
a6537b6
added llava-next to gitingore
manzar96 Dec 20, 2024
315249d
init script for eval run
manzar96 Dec 20, 2024
4a1fe31
init commit for adding molmo_hf
manzar96 Dec 20, 2024
03b46e7
requirements commit -- to be MODIFIED
manzar96 Dec 20, 2024
a4c8794
addition of pangea tasks+ pangea model+ script for aggregating results
manzar96 Dec 30, 2024
d944f97
small fix for nvlm_d class and config files of maxm task
Jan 7, 2025
0a4246c
pangea update+ updates on maxm task
manzar96 Jan 7, 2025
6e95970
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Jan 7, 2025
cd6686b
modified output parsing for marvl
manzar96 Jan 9, 2025
9b1cd7a
updated marvl task output parsing
manzar96 Jan 10, 2025
85f2c33
added the possibility of using tags during the inference
aminfarajian Feb 10, 2025
deaac05
Update .gitignore
aminfarajian Feb 10, 2025
3ceee86
Update .gitignore
aminfarajian Feb 10, 2025
5c95b8f
added the scripts folder
aminfarajian Feb 10, 2025
61349bb
added the script to parse the results
aminfarajian Feb 10, 2025
65ab80c
[WIP] add the Pixtral model
aminfarajian Feb 12, 2025
8d2f5a6
[WIP] add the Pixtral Model
aminfarajian Feb 18, 2025
958cbb0
update the pixtral model to work with vLLM
aminfarajian Feb 19, 2025
a231572
Updated Pixtral model
aminfarajian Feb 19, 2025
450e045
Updated Pixtral model
aminfarajian Feb 19, 2025
c8a6797
added multi-image support
aminfarajian Feb 19, 2025
32a4c9f
updated the parse results script
aminfarajian Feb 28, 2025
1a466f3
added cc-ocr
manzar96 Mar 3, 2025
c31b179
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Mar 4, 2025
b6a5f81
increased max new tokens for cc-ocr
manzar96 Mar 4, 2025
e9acfbb
fix pixstral run in signle gpu
manzar96 Mar 5, 2025
809c62c
added the subset of the multilingual tasks that are relevant to tower
aminfarajian Mar 6, 2025
a310b1d
Add ALM bench
sonalsannigrahi Mar 6, 2025
33b56f3
updated cc-ocr eval
manzar96 Mar 7, 2025
423ff2a
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Mar 7, 2025
29765d0
fixed the print step
aminfarajian Mar 7, 2025
6018cd2
updated cc ocr task
manzar96 Mar 7, 2025
9694104
minor changes to nvlm_d - warning when no image is included
manzar96 Mar 7, 2025
285a56c
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Mar 7, 2025
3aeb15d
updated cc-ocr eval
manzar96 Mar 7, 2025
e7ba1fa
init commit for commute--added
manzar96 Mar 7, 2025
c815890
updated parse results script for cc-ocr
manzar96 Mar 11, 2025
8842b0a
added the support for aya models
aminfarajian Mar 12, 2025
40f2c7e
changed the default conv_template to use qwen2 instead
aminfarajian Mar 14, 2025
441338e
add multi_30k
sonalsannigrahi Mar 19, 2025
51dd52d
add multi_30k
sonalsannigrahi Mar 19, 2025
ed6c786
fix source
sonalsannigrahi Mar 19, 2025
eb9a514
added system prompt to aya model
aminfarajian Mar 19, 2025
d06352d
update multi30_k
sonalsannigrahi Mar 20, 2025
a7cb1ad
fix merge conflicts
sonalsannigrahi Mar 20, 2025
3be37df
updated commute task
manzar96 Mar 20, 2025
ce17e9e
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Mar 20, 2025
d38f673
updated multi30k task with training prompt, support for multiple lang…
manzar96 Mar 20, 2025
cb97995
added custom system prompt to llava
aminfarajian Mar 20, 2025
846c542
added the parser of results for cc-ocr task
aminfarajian Mar 20, 2025
71384a2
fixed the issue in alm_bench_doc_to_text
aminfarajian Mar 20, 2025
6a50ed8
update alm bench
sonalsannigrahi Mar 21, 2025
425110d
update alm_bench prompt and parser
sonalsannigrahi Mar 21, 2025
b798548
updated cc-ocr
manzar96 Mar 21, 2025
5e18034
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Mar 21, 2025
96635c2
alm_bench per language
sonalsannigrahi Mar 21, 2025
7a9c2f9
parser for alm bench
sonalsannigrahi Mar 21, 2025
c2c0298
updated the settings of ai2d. we might want to roll them back
aminfarajian Mar 21, 2025
410f349
update for French [alm_bench]
sonalsannigrahi Mar 22, 2025
dbb1341
Update alm_bench names
sonalsannigrahi Mar 22, 2025
bbe5646
Update alm_bench names
sonalsannigrahi Mar 22, 2025
3e8adfe
Update alm_bench names
sonalsannigrahi Mar 22, 2025
f608c86
Update alm_bench names
sonalsannigrahi Mar 22, 2025
60180bb
Update alm_bench names
sonalsannigrahi Mar 22, 2025
415fbe7
Update alm_bench names
sonalsannigrahi Mar 22, 2025
12620aa
Update alm_bench names
sonalsannigrahi Mar 22, 2025
bd98094
Update alm_bench names
sonalsannigrahi Mar 22, 2025
534b45d
Rename alm_bench-pt.yaml to alm-bench-pt.yaml
sonalsannigrahi Mar 22, 2025
c001f3d
Update alm_bench names
sonalsannigrahi Mar 22, 2025
bc3032c
Update alm_bench names
sonalsannigrahi Mar 22, 2025
9231ac6
updated gemini_api
aminfarajian Mar 22, 2025
71afb48
update to handle empty images
manzar96 Mar 27, 2025
9fd60b7
added wmt24pp
manzar96 Mar 27, 2025
79c2215
small update on wmt24pp
manzar96 Mar 27, 2025
7ff1712
fixed yaml wmt24pp
manzar96 Mar 27, 2025
4028ab5
Merge branch 'main' into multiling_multimodal_tasks_add
aminfarajian Mar 28, 2025
1b6b667
updated wmt24pp group
aminfarajian Mar 28, 2025
07f628e
updated wmt24pp tasks
aminfarajian Mar 31, 2025
766f752
added the possibility of defining a custom system_prompt for Pangea
aminfarajian Mar 31, 2025
266e9c0
added arc text-only for multiple languages and aya vision bench
manzar96 Apr 7, 2025
4752fc2
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Apr 7, 2025
eaced6d
qwen2.5 loglikelihood added
manzar96 Apr 7, 2025
a37231f
updated llava_hf to work with text-only tasks and not throw errors if…
aminfarajian Apr 8, 2025
c475b5d
temporary changes to llava.py
aminfarajian Apr 8, 2025
e8e30f6
added loglikelihood to Aya model
aminfarajian Apr 8, 2025
b18d8ed
added a small comment to aya model
aminfarajian Apr 8, 2025
03cf759
changed slightly the loglikelihood function of llava to only consider…
aminfarajian Apr 8, 2025
fa296ea
added judges in aya vision
manzar96 Apr 10, 2025
2665a73
updated aya vision bench
manzar96 Apr 15, 2025
1d3bd71
aya vision bench updated + added comparative judge
manzar96 Apr 15, 2025
a37a973
updated yamls for aya-vision-bench
manzar96 Apr 16, 2025
d69bffa
Unify the argument name for the system prompt
aminfarajian Apr 17, 2025
2d45f67
changed to litellm usage for llms as judges for ayavisionbench
manzar96 Apr 17, 2025
108250f
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 Apr 17, 2025
15377a9
Update llava.py
aminfarajian Apr 21, 2025
16a1908
small fixes in judge_utils.py
aminfarajian Apr 21, 2025
19da9a6
updated aya vision with independent judge script
manzar96 Apr 27, 2025
94afa2b
resolve conflict
manzar96 Apr 27, 2025
85065f2
script to run eval with llms as judge for aya
manzar96 Apr 27, 2025
af0ae6e
small update
manzar96 Apr 27, 2025
f6fd1f5
small updates aya
manzar96 Apr 27, 2025
38e9453
final_answer parser
sonalsannigrahi May 12, 2025
91439f2
final_answer parser
sonalsannigrahi May 12, 2025
a62ad3a
fixed the bugs of alm-bench
aminfarajian May 14, 2025
e03a1b8
add llava_v6 model
aminfarajian May 19, 2025
0e86154
updated alm-bench pre-prompts
aminfarajian May 19, 2025
8810720
minor fix in llava model
aminfarajian May 19, 2025
a0377ae
added the support of llava_v6 to the task
aminfarajian May 20, 2025
612213f
small fix in the prompt
aminfarajian May 20, 2025
d7394b0
updated llava_hf and llava_v6
aminfarajian May 21, 2025
279a8f5
Merge branch 'multiling_multimodal_tasks_add' of github.com:deep-spin…
manzar96 May 21, 2025
d13ec67
temporary addition of m-wild-vision bench to switch to alm-bench branch
manzar96 May 21, 2025
d293ce5
updated gitignore with lm-eval-harness
manzar96 May 21, 2025
0b2464c
removed breakpoint from llava_hf.py
manzar96 May 22, 2025
6659b9c
ayavision and m-wild bench
manzar96 May 22, 2025
918a58f
added support of all languages for ayavisionbench and m-wild-vision
manzar96 May 22, 2025
f912945
added default pre-prompts and post-prompts in configs for ayavision a…
manzar96 May 22, 2025
d2057f3
added the v6 prompts and models
aminfarajian Jun 4, 2025
990dded
Merge branch 'alm-bench' of https://github.com/deep-spin/lmms-eval in…
aminfarajian Jun 4, 2025
29d0f33
updates for the commute task
aminfarajian Jun 9, 2025
d04bc31
update for pixtral
aminfarajian Jun 9, 2025
3c287bb
llavq_hd loglikelihood implementations + changes on commute
manzar96 Jun 9, 2025
f8e35c3
mmmu_pro modifications
GuilhermeViveiros Jun 30, 2025
cb69c87
Merge branch 'alm-bench' of https://github.com/deep-spin/lmms-eval in…
GuilhermeViveiros Aug 12, 2025
eb86cf2
Merge branch 'multiling_multimodal_tasks_add' of https://github.com/d…
GuilhermeViveiros Aug 27, 2025
8857d4c
blink benchmark related stuff
GuilhermeViveiros Aug 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
7 changes: 3 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ temp
profile_default/
ipython_config.py
logs/
scripts/
wandb/
SimSun.ttf
submissions/
Expand All @@ -30,6 +29,7 @@ cache_dir
ckpt
pretrained/
LLaVA/
LLaVA-NeXT/
*logs
*.isorted
temp/
Expand All @@ -41,6 +41,5 @@ Video-MME/
VATEX/
lmms_eval/tasks/vatex/__pycache__/utils.cpython-310.pyc
lmms_eval/tasks/mlvu/__pycache__/utils.cpython-310.pyc

scripts/
.env
.env
lm-evaluation-harness/
185 changes: 185 additions & 0 deletions lmms-eval.reqs
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
accelerate 1.2.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.4
aiohttp 3.11.10
aiosignal 1.3.1
altair 5.5.0
annotated-types 0.7.0
anyio 4.7.0
async-timeout 5.0.1
attrs 24.2.0
av 14.0.1
bitsandbytes 0.45.0
black 24.1.0
certifi 2024.8.30
cfgv 3.4.0
chardet 5.2.0
charset-normalizer 3.4.0
click 8.1.7
colorama 0.4.6
contourpy 1.3.1
cycler 0.12.1
DataProperty 1.0.1
datasets 2.20.0
decord 0.6.0
dill 0.3.7
distlib 0.3.9
distro 1.9.0
docker-pycreds 0.4.0
einops 0.6.1
einops-exts 0.0.4
et_xmlfile 2.0.0
evaluate 0.4.3
exceptiongroup 1.2.2
fastapi 0.115.6
ffmpy 0.4.0
filelock 3.16.1
fonttools 4.55.2
frozenlist 1.5.0
fsspec 2023.10.0
ftfy 6.3.1
gitdb 4.0.11
GitPython 3.1.43
gradio 4.16.0
gradio_client 0.8.1
h11 0.14.0
hf_transfer 0.1.8
httpcore 0.16.3
httpx 0.24.0
huggingface-hub 0.26.5
identify 2.6.3
idna 3.10
importlib_resources 6.4.5
isort 5.13.2
Jinja2 3.1.4
jiter 0.8.2
joblib 1.4.2
jsonlines 4.0.0
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.7
latex2mathml 3.77.0
llava 1.2.2.post1 /lustre/fshomisc/home/rech/genrce01/ued79zb/repos/LLaVA
lmms_eval 0.3.0 /lustre/fshomisc/home/rech/genrce01/ued79zb/repos/lmms-eval
loguru 0.7.3
lxml 5.3.0
markdown-it-py 3.0.0
markdown2 2.5.1
MarkupSafe 2.1.5
matplotlib 3.9.3
mbstrdecoder 1.1.3
mdurl 0.1.2
mpmath 1.3.0
multidict 6.1.0
multiprocess 0.70.15
mypy-extensions 1.0.0
narwhals 1.17.0
networkx 3.4.2
nltk 3.9.1
nodeenv 1.9.1
numexpr 2.10.2
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.1.105
openai 1.57.2
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
orjson 3.10.12
packaging 24.2
pandas 2.2.3
pathspec 0.12.1
pathvalidate 3.2.1
peft 0.14.0
pillow 10.4.0
pip 24.2
platformdirs 4.3.6
portalocker 3.0.0
pre_commit 4.0.1
propcache 0.2.1
protobuf 3.20.0
psutil 6.1.0
pyarrow 18.1.0
pyarrow-hotfix 0.6
pybind11 2.13.6
pycocoevalcap 1.2
pycocotools 2.0.8
pydantic 2.10.3
pydantic_core 2.27.1
pydub 0.25.1
Pygments 2.18.0
pyparsing 3.2.0
pytablewriter 1.2.0
python-dateutil 2.9.0.post0
python-multipart 0.0.19
pytz 2024.2
PyYAML 6.0.2
referencing 0.35.1
regex 2024.11.6
requests 2.32.3
rfc3986 1.5.0
rich 13.9.4
rpds-py 0.22.3
ruff 0.8.2
sacrebleu 2.4.3
safetensors 0.4.5
scikit-learn 1.2.2
scipy 1.14.1
semantic-version 2.10.0
sentence-transformers 3.3.1
sentencepiece 0.1.99
sentry-sdk 2.19.2
setproctitle 1.3.4
setuptools 75.1.0
shellingham 1.5.4
shortuuid 1.0.13
six 1.17.0
smmap 5.0.1
sniffio 1.3.1
sqlitedict 2.1.0
starlette 0.41.3
svgwrite 1.4.3
sympy 1.13.1
tabledata 1.3.3
tabulate 0.9.0
tcolorpy 0.1.6
tenacity 8.3.0
threadpoolctl 3.5.0
tiktoken 0.8.0
timm 0.6.13
tokenizers 0.21.0
tomli 2.2.1
tomlkit 0.12.0
torch 2.1.2
torchvision 0.16.2
tqdm 4.67.1
tqdm-multiprocess 0.0.11
transformers 4.47.0
transformers-stream-generator 0.0.5
triton 2.1.0
typepy 1.3.2
typer 0.15.1
typing_extensions 4.12.2
tzdata 2024.2
urllib3 2.2.3
uvicorn 0.32.1
virtualenv 20.28.0
wandb 0.19.0
wavedrom 2.0.3.post3
wcwidth 0.2.13
websockets 11.0.3
wheel 0.44.0
xxhash 3.5.0
yarl 1.18.3
yt-dlp 2024.12.6
zss 1.2.0
zstandard 0.23.0
5 changes: 3 additions & 2 deletions lmms_eval/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,8 +168,9 @@ def simple_evaluate(
if task_manager is None:
task_manager = TaskManager(verbosity, model_name=model)

# FOR DEBUG PURPOSES -> COMMENT THIS
task_dict = get_task_dict(tasks, task_manager)

if isinstance(model, str):
if model_args is None:
model_args = ""
Expand Down Expand Up @@ -590,7 +591,7 @@ def evaluate(
num_fewshot,
higher_is_better,
) = consolidate_results(eval_tasks)

### Calculate group metrics ###
if bool(results):
results, versions, show_group_table, *_ = consolidate_group_results(results, versions, task_dict)
Expand Down
10 changes: 10 additions & 0 deletions lmms_eval/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
"llama_vid": "LLaMAVid",
"llama_vision": "LlamaVision",
"llava": "Llava",
"llava_next": "LlavaNext",
"llava_v6": "Llava_v6",
"llava_hf": "LlavaHf",
"llava_onevision": "Llava_OneVision",
"llava_onevision_moviechat": "Llava_OneVision_MovieChat",
Expand Down Expand Up @@ -65,6 +67,14 @@
"vllm": "VLLM",
"xcomposer2_4KHD": "XComposer2_4KHD",
"xcomposer2d5": "XComposer2D5",
"oryx": "Oryx",
"llama_vision": "LlamaVision",
"aria": "Aria",
"pangea": "Pangea",
"pixtral": "Pixtral",
"pixtral_v6": "Pixtral_v6",
"aya": "Aya",
"aya_v6": "Aya_v6",
"egogpt": "EgoGPT",
"internvideo2_5": "InternVideo2_5",
"videochat_flash": "VideoChat_Flash",
Expand Down
Loading