Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
17be7e5
model: add image support for jina embeddings v4 (#2893)
makram93 Jul 11, 2025
9ecac21
model: add kalm_models (kalm-emb-v2) ModelMeta (new PR) (#2889)
ItsukiFujii Jul 15, 2025
4a47f90
Add Classification Evaluator unit test (#2838)
fzowl Jul 15, 2025
9864e2a
fix: update colpali engine models (#2905)
paultltc Jul 16, 2025
5a8ccec
1.38.35
invalid-email-address Jul 16, 2025
c7078af
Evaluator tests (#2910)
fzowl Jul 19, 2025
aef1e33
Classification dataset cleaning (#2900)
AlexeyVatolin Jul 19, 2025
56c98ed
Update tasks & benchmarks tables
github-actions[bot] Jul 19, 2025
57438c2
dataset: Add JapaneseSentimentClassification (#2913)
lsz05 Jul 19, 2025
372fc4c
Update tasks & benchmarks tables
github-actions[bot] Jul 19, 2025
a298fa9
fix: change `passage` prompt to `document` (#2912)
Samoed Jul 20, 2025
8eb4f6d
1.38.36
invalid-email-address Jul 20, 2025
5a868e3
model: Add OpenSearch inf-free sparse encoding models (#2903)
zhichao-aws Jul 20, 2025
1dcc6dc
dataset: add BarExamQA dataset (#2916)
abdurrahmanbutler Jul 21, 2025
c1922c8
Use `mteb.get_model` in adding_a_dataset.md (#2922)
Samoed Jul 21, 2025
0ac0231
fix: specify revision for opensearch (#2919)
Samoed Jul 21, 2025
b12b926
1.38.37
invalid-email-address Jul 21, 2025
533ce59
Update the link for gemini-embedding-001 (#2928)
Feiyang1 Jul 22, 2025
5ed6c90
fix: replace with passage (#2934)
makram93 Jul 22, 2025
79a43af
fix: Only import SparseEncoder once sentence-transformer version have…
KennethEnevoldsen Jul 22, 2025
8496ec2
fix: Prevent incorrectly passing "selector_state" to `get_benchmark` …
KennethEnevoldsen Jul 22, 2025
a78debf
docs: Update adding_a_dataset.md (#2947)
KennethEnevoldsen Jul 25, 2025
4ef8571
ci: bump semantic release
KennethEnevoldsen Jul 25, 2025
03a0582
1.38.38
Jul 25, 2025
8416541
dataset: Add BSARD v2, fixing the data loading issues of v1 (#2935)
nikolay-banar Jul 25, 2025
da46c8e
Update tasks & benchmarks tables
github-actions[bot] Jul 25, 2025
42dfe0d
dataset: add GovReport dataset (#2953)
abdurrahmanbutler Jul 29, 2025
007d19f
dataset: add BillSum datasets (#2943)
abdurrahmanbutler Jul 30, 2025
e4f30e9
Update tasks & benchmarks tables
github-actions[bot] Jul 30, 2025
36df9ca
fix: Add new benchmark beRuSciBench along with AbsTaskTextRegression …
AlexeyVatolin Aug 2, 2025
a86e2dd
Update tasks & benchmarks tables
github-actions[bot] Aug 2, 2025
4a567d2
1.38.39
Aug 3, 2025
6c1f1c6
qzhou-embedding model_meta & implementation (#2975)
PennyYu123 Aug 7, 2025
e5d386b
model: Add Voyage 3.5 model configuration (#3005)
fzowl Aug 9, 2025
042db73
model: BAAI/bge-m3-unsupervised Model (#3007)
fzoll Aug 9, 2025
01840ce
lint: Correcting lint errors (#3004)
fzowl Aug 9, 2025
741b022
dataset: Added 50 Vietnamese dataset from vn-mteb (#2964)
BaoLocPham Aug 9, 2025
4adf565
Update tasks & benchmarks tables
github-actions[bot] Aug 9, 2025
87eb27c
model: Add Cohere embed-v4.0 model support (#3006)
fzowl Aug 10, 2025
d8b2910
Add OpenAI models with 512 dimension (#3008)
fzoll Aug 11, 2025
ea41e7a
Standardise task names and fix citation formatting (#3026)
abdurrahmanbutler Aug 13, 2025
177997f
Update tasks & benchmarks tables
github-actions[bot] Aug 13, 2025
20bc80c
fix: Add missing training sets for qzhou (#3023)
PennyYu123 Aug 16, 2025
f3f11cc
1.38.40
Aug 16, 2025
96a7cc5
model: Add samilpwc_models meta (#3028)
ElPlaguister Aug 16, 2025
37d115a
model: Add granite-vision-embedding model (#3029)
roipony Aug 16, 2025
5c65913
fix: incorrect revision for SNLRetrieval (#3033)
KennethEnevoldsen Aug 17, 2025
d4e6223
dataset: Add HumanEvalRetrieval task (#3022)
fzoll Aug 17, 2025
a96f2e4
Update tasks & benchmarks tables
github-actions[bot] Aug 17, 2025
3398742
1.38.41
Aug 17, 2025
4aaf47e
ci: reduce parallel runs for when checking if a dataset exists (#3035)
KennethEnevoldsen Aug 17, 2025
e124b56
ci: Updating rerun delays to prevent false positives errors
KennethEnevoldsen Aug 17, 2025
d729d32
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb
KennethEnevoldsen Aug 17, 2025
e476dc3
ci: Updating rerun delays to prevent false positives errors
KennethEnevoldsen Aug 17, 2025
72f7b05
model: Add GreenNode Vietnamese Embedding models (#2994)
BaoLocPham Aug 18, 2025
e08ec56
model: add granite-embedding-english R2 models (#3050)
aashka-trivedi Aug 18, 2025
c58b319
fix: Updated revision for jina-embeddings-v4 (#3046)
jupyterjazz Aug 18, 2025
46f4261
1.38.42
Aug 18, 2025
4e3fcd8
Fix 3 VN-MTEB Pair Classification tasks (#3053)
BaoLocPham Aug 19, 2025
ac69263
dataset: Add mbpp retrieval (#3037)
fzoll Aug 20, 2025
1fff5ce
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
7b289f5
dataset: Added wikisql retrieval (#3039)
fzoll Aug 20, 2025
7da3cf9
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
6fa6efa
ci: Temporarily limit pytrec version to "pytrec-eval-terrier>=0.5.6, …
Samoed Aug 20, 2025
ea801ec
fix MBPPRetrieval revision (#3055)
isaac-chung Aug 20, 2025
0a6e855
fix: Add VN-MTEB benchmark and Leaderboard (#2995)
BaoLocPham Aug 20, 2025
def1377
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
ef9771c
1.38.43
Aug 20, 2025
53d7d84
Add hc3finance retrieval (#3041)
fzoll Aug 20, 2025
7b57185
Add finqa retrieval (#3042)
fzoll Aug 20, 2025
fd8f89e
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
4da11c6
Add FinanceBenchRetrieval task (#3044)
fzoll Aug 20, 2025
fe57390
Update tasks & benchmarks tables
github-actions[bot] Aug 20, 2025
a291a05
Add FreshStackRetrieval task (#3043)
fzoll Aug 21, 2025
e1ede42
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
53f0986
dataset: Add ds1000 retrieval (#3038)
fzoll Aug 21, 2025
d2fcbac
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
e91cb8e
Add ChatDoctorRetrieval (#3045)
fzoll Aug 21, 2025
69099fe
Update tasks & benchmarks tables
github-actions[bot] Aug 21, 2025
8e1c354
Correcting the (new) DS1000 dataset's revision (#3063)
fzoll Aug 21, 2025
cf3e1bb
dataset: Add JinaVDR (#2942)
maximilianwerk Aug 22, 2025
26468f8
Update tasks & benchmarks tables
github-actions[bot] Aug 22, 2025
4994ea1
model: Add CoDi-Embedding-V1 (#3054)
ZBWpro Aug 22, 2025
9c27f71
fix: ensure that there are always relevant docs attached to query (#3…
KennethEnevoldsen Aug 22, 2025
616a517
1.38.44
Aug 22, 2025
70724e7
Correcting the JINA models with SentenceTransformerWrapper (#3071)
fzoll Aug 24, 2025
df719cc
ci: Add stale workflow (#3066)
isaac-chung Aug 25, 2025
1f9641a
fix: open_clip package validation (#3073)
FacerAin Aug 25, 2025
f210ac1
1.38.45
Aug 25, 2025
63a0c60
fix: Update revision for qzhou models (#3069)
PennyYu123 Aug 25, 2025
3153707
1.38.46
Aug 25, 2025
d2c3570
Fix the reference link for CoDi-Embedding-V1 (#3075)
ZBWpro Aug 25, 2025
1541318
fix: Add beta version of RTEB related benchmarks (#3048)
fzoll Aug 27, 2025
bce7471
1.38.47
Aug 27, 2025
b46b633
fix: run `ruff check` on all files during ci (#3086)
Samoed Aug 27, 2025
6db355e
1.38.48
Aug 27, 2025
cd14ef6
Move dev to dependency groups (#3088)
Samoed Aug 28, 2025
139fc73
fix: Improving validate_task_to_prompt_name logs and error messages (…
RyanMullins Aug 28, 2025
27be671
fix: duplicate mteb multilingual variables (#3080)
Samoed Aug 28, 2025
5bf303b
Update tasks & benchmarks tables
github-actions[bot] Aug 28, 2025
e4c2a95
model: mdbr-leaf models (#3081)
robin-vjc Aug 28, 2025
2b7089a
1.38.49
Aug 28, 2025
17fa697
CI: Set upper limit for xdist version (#3098)
Samoed Aug 29, 2025
9586697
Combine Plots and Tables into a Single (#3047)
q275343119 Aug 29, 2025
5851c7a
fix: Updating the default batch size calculation in the voyage models…
fzoll Sep 1, 2025
80966c2
1.38.50
Sep 1, 2025
4012517
fix: Add @classmethod for @field_validators in TaskMetadata (#3100)
Samoed Sep 1, 2025
7303c15
Align task prompt dict with `PromptType` (#3101)
Samoed Sep 1, 2025
b7b5d11
1.38.51
Sep 1, 2025
4774b74
model: Add ModelMeta for OrdalieTech/Solon-embeddings-mini-beta-1.1 (…
mathlesage Sep 1, 2025
5844cc7
fix: Allow closed datasets (#3059)
fzoll Sep 1, 2025
07bf861
1.38.52
Sep 1, 2025
5e0bf22
Merge branch 'maeb' into main-merge-for-maeb
isaac-chung Sep 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/leaderboard_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:

- name: Install dependencies (incl. leaderboard extra)
run: |
pip install ".[dev,leaderboard]"
pip install ".[leaderboard]" --group dev

- name: Run leaderboard build test
run: |
Expand Down
8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
install:
@echo "--- 🚀 Installing project dependencies ---"
pip install -e ".[dev,image]"
pip install -e ".[image]" --group dev
pre-commit install

install-for-tests:
@echo "--- 🚀 Installing project dependencies for test ---"
@echo "This ensures that the project is not installed in editable mode"
pip install ".[dev,image]"
pip install ".[image]" --group dev

lint:
@echo "--- 🧹 Running linters ---"
Expand All @@ -17,7 +17,7 @@ lint-check:
@echo "--- 🧹 Check is project is linted ---"
# Required for CI to work, otherwise it will just pass
ruff format . --check # running ruff formatting
ruff check **/*.py # running ruff linting
ruff check . # running ruff linting

test:
@echo "--- 🧪 Running tests ---"
Expand All @@ -43,7 +43,7 @@ build-docs:

model-load-test:
@echo "--- 🚀 Running model load test ---"
pip install ".[dev, pylate,gritlm,xformers,model2vec]"
pip install ".[pylate,gritlm,xformers,model2vec]" --group dev
python scripts/extract_model_names.py $(BASE_BRANCH) --return_one_model_name_per_file
python tests/test_models/model_loading.py --model_name_file scripts/model_names.txt

Expand Down
1 change: 1 addition & 0 deletions docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The following table gives you an overview of the benchmarks in MTEB.
| [MTEB(Indic, v1)](https://arxiv.org/abs/2502.13595) | Indic | 23 | BitextMining: 4, Clustering: 1, Classification: 13, PairClassification: 1, Retrieval: 2, Reranking: 1, STS: 1 | [Constructed, Encyclopaedic, Fiction, Government, Legal, News, Non-fiction, Religious, Reviews, Social, Spoken, Web, Written] | asm,awa,ben,bgc,bho,bod,boy,brx,doi,eng,gbm,gom,guj,hin,hne,kan,kas,mai,mal,mar,mni,mup,mwr,nep,npi,ory,pan,pus,raj,san,sat,snd,tam,tel,urd |
| MTEB(Law, v1) | Legal | 8 | Retrieval: 8 | [Legal, Written] | deu,eng,zho |
| MTEB(Medical, v1) | Medical | 12 | Retrieval: 9, Clustering: 2, Reranking: 1 | [Academic, Government, Medical, Non-fiction, Web, Written] | ara,cmn,eng,fra,kor,pol,rus,spa,vie,zho |
| [MTEB(Multilingual, v1)](https://arxiv.org/abs/2502.13595) | Multilingual | 132 | BitextMining: 13, Classification: 43, Clustering: 17, Retrieval: 18, InstructionRetrieval: 3, MultilabelClassification: 5, PairClassification: 11, Reranking: 6, STS: 16 | [Academic, Blog, Constructed, Encyclopaedic, Entertainment, Fiction, Financial, Government, Legal, Medical, News, Non-fiction, Programming, Religious, Reviews, Social, Spoken, Subtitles, Web, Written] | aai,aak,aau,aaz,abs,abt,abx,aby,ace,acf,acm,acq,acr,acu,adz,aeb,aer,aey,afr,agd,agg,agm,agn,agr,agt,agu,aia,aii,ajp,aka,ake,alp,alq,als,aly,ame,amf,amh,amk,amm,amn,amo,amp,amr,amu,amx,ang,anh,anv,aoi,aoj,aom,aon,apb,apc,ape,apn,apr,apu,apw,apz,ara,arb,are,arl,arn,arp,arq,ars,ary,arz,asm,aso,ast,ata,atb,atd,atg,att,auc,aui,auy,avt,awa,awb,awk,awx,ayr,azb,aze,azg,azj,azz,bak,bam,ban,bao,bba,bbb,bbc,bbr,bch,bco,bdd,bea,bef,bel,bem,ben,beo,ber,beu,bew,bgc,bgs,bgt,bhg,bhl,bho,bhp,big,bjk,bjn,bjp,bjr,bjv,bjz,bkd,bki,bkq,bkx,blw,blz,bmh,bmk,bmr,bmu,bnp,boa,bod,boj,bon,bos,box,boy,bpr,bps,bqc,bqp,bre,brx,bsj,bsn,bsp,bss,bug,buk,bul,bus,bvd,bvr,bxh,byr,byx,bzd,bzh,bzj,caa,cab,cac,caf,cak,cao,cap,car,cat,cav,cax,cbc,cbi,cbk,cbr,cbs,cbt,cbu,cbv,cco,ceb,cek,ces,cgc,cha,chd,chf,chk,chq,chv,chz,cjk,cjo,cjv,ckb,cle,clu,cme,cmn,cmo,cni,cnl,cnt,cof,con,cop,cor,cot,cpa,cpb,cpc,cpu,cpy,crh,crn,crx,csb,cso,csy,cta,cth,ctp,ctu,cub,cuc,cui,cuk,cut,cux,cwe,cya,cym,daa,dad,dah,dan,ded,deu,dgc,dgr,dgz,dhg,dif,dik,div,dji,djk,djr,dob,doi,dop,dov,dsb,dtp,dwr,dww,dwy,dyu,dzo,ebk,eko,ell,emi,emp,eng,enq,epo,eri,ese,esk,est,etr,eus,ewe,faa,fai,fao,far,fas,ffm,fij,fil,fin,fon,for,fra,fry,fuc,fue,fuf,fuh,fur,fuv,gah,gai,gam,gaw,gaz,gbm,gdn,gdr,geb,gfk,ghs,gla,gle,glg,glk,glv,gmv,gng,gnn,gnw,gof,gom,grc,grn,gsw,gub,guh,gui,guj,gul,gum,gun,guo,gup,gux,gvc,gvf,gvn,gvs,gwi,gym,gyr,hat,hau,haw,hbo,hch,heb,heg,hin,hix,hla,hlt,hmn,hmo,hne,hns,hop,hot,hrv,hsb,hto,hub,hui,hun,hus,huu,huv,hvn,hye,ian,ibo,ido,ign,ikk,ikw,ile,ilo,imo,ina,inb,ind,ino,iou,ipi,isl,isn,ita,iws,ixl,jac,jae,jao,jav,jic,jid,jiv,jni,jpn,jvn,kab,kac,kam,kan,kaq,kas,kat,kaz,kbc,kbh,kbm,kbp,kbq,kdc,kde,kdl,kea,kek,ken,kew,kgf,kgk,kgp,khk,khm,khs,khz,kik,kin,kir,kiw,kiz,kje,kjs,kkc,kkl,klt,klv,kmb,kmg,kmh,kmk,kmo,kmr,kms,kmu,knc,kne,knf,knj,knv,kon,kor,kos,kpf,kpg,kpj,kpr,kpw,kpx,kqa,kqc,kqf,kql,kqw,krc,ksd,ksj,ksr,ktm,kto,kud,kue,kup,kur,kvg,kvn,kwd,kwf,kwi,kwj,kyc,kyf,kyg,kyq,kyz,kze,kzj,lac,lao,lat,lav,lbb,lbk,lcm,leu,lex,lfn,lgl,lid,lif,lij,lim,lin,lit,llg,lmo,ltg,ltz,lua,lug,luo,lus,lvs,lww,maa,mad,mag,mai,maj,mak,mal,mam,maq,mar,mau,mav,max,maz,mbb,mbc,mbh,mbj,mbl,mbs,mbt,mca,mcb,mcd,mcf,mco,mcp,mcq,mcr,mdy,med,mee,mek,meq,met,meu,mey,mgc,mgh,mgw,mhl,mhr,mib,mic,mie,mig,mih,mil,min,mio,mir,mit,miz,mjc,mkd,mkj,mkl,mkn,mks,mle,mlg,mlh,mlp,mlt,mmo,mmx,mna,mni,mon,mop,mos,mox,mph,mpj,mpm,mpp,mps,mpt,mpx,mqb,mqj,mri,msa,msb,msc,msk,msm,msy,mti,mto,mui,mup,mux,muy,mva,mvn,mwc,mwe,mwf,mwp,mwr,mxb,mxp,mxq,mxt,mya,myk,myu,myw,myy,mzz,nab,naf,nak,nas,nbq,nca,nch,ncj,ncl,ncu,nde,ndg,ndj,nds,nep,nfa,ngp,ngu,nhe,nhg,nhi,nho,nhr,nhu,nhw,nhy,nif,nii,nij,nin,nko,nld,nlg,nna,nno,nnq,noa,nob,nop,nor,not,nou,nov,npi,npl,nqo,nsn,nso,nss,ntj,ntp,ntu,nus,nuy,nvm,nwi,nya,nys,nyu,obo,oci,okv,omw,ong,ons,ood,opm,orm,orv,ory,ote,otm,otn,otq,ots,pab,pad,pag,pah,pam,pan,pao,pap,pbt,pcm,pes,pib,pio,pir,piu,pjt,pls,plt,plu,pma,pms,poe,poh,poi,pol,pon,por,poy,ppo,prf,pri,prs,ptp,ptu,pus,pwg,qub,quc,quf,quh,qul,qup,quy,qvc,qve,qvh,qvm,qvn,qvs,qvw,qvz,qwh,qxh,qxn,qxo,rai,raj,reg,rej,rgu,rkb,rmc,rmy,rom,ron,roo,rop,row,rro,ruf,rug,run,rus,rwo,sab,sag,sah,san,sat,sbe,sbk,sbs,scn,sco,seh,sey,sgb,sgz,shi,shj,shn,shp,sim,sin,sja,slk,sll,slv,smk,smo,sna,snc,snd,snn,snp,snx,sny,som,soq,sot,soy,spa,spl,spm,spp,sps,spy,sqi,srd,sri,srm,srn,srp,srq,ssd,ssg,ssw,ssx,stp,sua,sue,sun,sus,suz,svk,swa,swe,swg,swh,swp,sxb,szl,tac,tah,taj,tam,taq,tat,tav,taw,tbc,tbf,tbg,tbo,tbz,tca,tcs,tcz,tdt,tee,tel,ter,tet,tew,tfr,tgk,tgl,tgo,tgp,tha,tif,tim,tir,tiw,tiy,tke,tku,tlf,tmd,tna,tnc,tnk,tnn,tnp,toc,tod,tof,toj,ton,too,top,tos,tpa,tpi,tpt,tpz,trc,tsn,tso,tsw,ttc,tte,tuc,tue,tuf,tuk,tum,tuo,tur,tvk,twi,txq,txu,tyv,tzj,tzl,tzm,tzo,ubr,ubu,udu,uig,ukr,uli,ulk,umb,upv,ura,urb,urd,uri,urt,urw,usa,usp,uvh,uvl,uzb,uzn,vec,ven,vid,vie,viv,vmy,waj,wal,wap,war,wat,wbi,wbp,wed,wer,wim,wiu,wiv,wln,wmt,wmw,wnc,wnu,wol,wos,wrk,wro,wrs,wsk,wuu,wuv,xav,xbi,xed,xho,xla,xnn,xon,xsi,xtd,xtm,yaa,yad,yal,yap,yaq,yby,ycn,ydd,yid,yka,yle,yml,yon,yor,yrb,yre,yss,yue,yuj,yut,yuw,yva,zaa,zab,zac,zad,zai,zaj,zam,zao,zap,zar,zas,zat,zav,zaw,zca,zga,zho,zia,ziw,zlm,zos,zpc,zpl,zpm,zpo,zpq,zpu,zpv,zpz,zsm,zsr,ztq,zty,zul,zyp |
| [MTEB(Multilingual, v2)](https://arxiv.org/abs/2502.13595) | Multilingual | 131 | BitextMining: 13, Classification: 43, Clustering: 16, Retrieval: 18, InstructionRetrieval: 3, MultilabelClassification: 5, PairClassification: 11, Reranking: 6, STS: 16 | [Academic, Blog, Constructed, Encyclopaedic, Entertainment, Fiction, Financial, Government, Legal, Medical, News, Non-fiction, Programming, Religious, Reviews, Social, Spoken, Subtitles, Web, Written] | aai,aak,aau,aaz,abs,abt,abx,aby,ace,acf,acm,acq,acr,acu,adz,aeb,aer,aey,afr,agd,agg,agm,agn,agr,agt,agu,aia,aii,ajp,aka,ake,alp,alq,als,aly,ame,amf,amh,amk,amm,amn,amo,amp,amr,amu,amx,ang,anh,anv,aoi,aoj,aom,aon,apb,apc,ape,apn,apr,apu,apw,apz,ara,arb,are,arl,arn,arp,arq,ars,ary,arz,asm,aso,ast,ata,atb,atd,atg,att,auc,aui,auy,avt,awa,awb,awk,awx,ayr,azb,aze,azg,azj,azz,bak,bam,ban,bao,bba,bbb,bbc,bbr,bch,bco,bdd,bea,bef,bel,bem,ben,beo,ber,beu,bew,bgc,bgs,bgt,bhg,bhl,bho,bhp,big,bjk,bjn,bjp,bjr,bjv,bjz,bkd,bki,bkq,bkx,blw,blz,bmh,bmk,bmr,bmu,bnp,boa,bod,boj,bon,bos,box,boy,bpr,bps,bqc,bqp,bre,brx,bsj,bsn,bsp,bss,bug,buk,bul,bus,bvd,bvr,bxh,byr,byx,bzd,bzh,bzj,caa,cab,cac,caf,cak,cao,cap,car,cat,cav,cax,cbc,cbi,cbk,cbr,cbs,cbt,cbu,cbv,cco,ceb,cek,ces,cgc,cha,chd,chf,chk,chq,chv,chz,cjk,cjo,cjv,ckb,cle,clu,cme,cmn,cmo,cni,cnl,cnt,cof,con,cop,cor,cot,cpa,cpb,cpc,cpu,cpy,crh,crn,crx,csb,cso,csy,cta,cth,ctp,ctu,cub,cuc,cui,cuk,cut,cux,cwe,cya,cym,daa,dad,dah,dan,ded,deu,dgc,dgr,dgz,dhg,dif,dik,div,dji,djk,djr,dob,doi,dop,dov,dsb,dtp,dwr,dww,dwy,dyu,dzo,ebk,eko,ell,emi,emp,eng,enq,epo,eri,ese,esk,est,etr,eus,ewe,faa,fai,fao,far,fas,ffm,fij,fil,fin,fon,for,fra,fry,fuc,fue,fuf,fuh,fur,fuv,gah,gai,gam,gaw,gaz,gbm,gdn,gdr,geb,gfk,ghs,gla,gle,glg,glk,glv,gmv,gng,gnn,gnw,gof,gom,grc,grn,gsw,gub,guh,gui,guj,gul,gum,gun,guo,gup,gux,gvc,gvf,gvn,gvs,gwi,gym,gyr,hat,hau,haw,hbo,hch,heb,heg,hin,hix,hla,hlt,hmn,hmo,hne,hns,hop,hot,hrv,hsb,hto,hub,hui,hun,hus,huu,huv,hvn,hye,ian,ibo,ido,ign,ikk,ikw,ile,ilo,imo,ina,inb,ind,ino,iou,ipi,isl,isn,ita,iws,ixl,jac,jae,jao,jav,jic,jid,jiv,jni,jpn,jvn,kab,kac,kam,kan,kaq,kas,kat,kaz,kbc,kbh,kbm,kbp,kbq,kdc,kde,kdl,kea,kek,ken,kew,kgf,kgk,kgp,khk,khm,khs,khz,kik,kin,kir,kiw,kiz,kje,kjs,kkc,kkl,klt,klv,kmb,kmg,kmh,kmk,kmo,kmr,kms,kmu,knc,kne,knf,knj,knv,kon,kor,kos,kpf,kpg,kpj,kpr,kpw,kpx,kqa,kqc,kqf,kql,kqw,krc,ksd,ksj,ksr,ktm,kto,kud,kue,kup,kur,kvg,kvn,kwd,kwf,kwi,kwj,kyc,kyf,kyg,kyq,kyz,kze,kzj,lac,lao,lat,lav,lbb,lbk,lcm,leu,lex,lfn,lgl,lid,lif,lij,lim,lin,lit,llg,lmo,ltg,ltz,lua,lug,luo,lus,lvs,lww,maa,mad,mag,mai,maj,mak,mal,mam,maq,mar,mau,mav,max,maz,mbb,mbc,mbh,mbj,mbl,mbs,mbt,mca,mcb,mcd,mcf,mco,mcp,mcq,mcr,mdy,med,mee,mek,meq,met,meu,mey,mgc,mgh,mgw,mhl,mhr,mib,mic,mie,mig,mih,mil,min,mio,mir,mit,miz,mjc,mkd,mkj,mkl,mkn,mks,mle,mlg,mlh,mlp,mlt,mmo,mmx,mna,mni,mon,mop,mos,mox,mph,mpj,mpm,mpp,mps,mpt,mpx,mqb,mqj,mri,msa,msb,msc,msk,msm,msy,mti,mto,mui,mup,mux,muy,mva,mvn,mwc,mwe,mwf,mwp,mwr,mxb,mxp,mxq,mxt,mya,myk,myu,myw,myy,mzz,nab,naf,nak,nas,nbq,nca,nch,ncj,ncl,ncu,nde,ndg,ndj,nds,nep,nfa,ngp,ngu,nhe,nhg,nhi,nho,nhr,nhu,nhw,nhy,nif,nii,nij,nin,nko,nld,nlg,nna,nno,nnq,noa,nob,nop,nor,not,nou,nov,npi,npl,nqo,nsn,nso,nss,ntj,ntp,ntu,nus,nuy,nvm,nwi,nya,nys,nyu,obo,oci,okv,omw,ong,ons,ood,opm,orm,orv,ory,ote,otm,otn,otq,ots,pab,pad,pag,pah,pam,pan,pao,pap,pbt,pcm,pes,pib,pio,pir,piu,pjt,pls,plt,plu,pma,pms,poe,poh,poi,pol,pon,por,poy,ppo,prf,pri,prs,ptp,ptu,pus,pwg,qub,quc,quf,quh,qul,qup,quy,qvc,qve,qvh,qvm,qvn,qvs,qvw,qvz,qwh,qxh,qxn,qxo,rai,raj,reg,rej,rgu,rkb,rmc,rmy,rom,ron,roo,rop,row,rro,ruf,rug,run,rus,rwo,sab,sag,sah,san,sat,sbe,sbk,sbs,scn,sco,seh,sey,sgb,sgz,shi,shj,shn,shp,sim,sin,sja,slk,sll,slv,smk,smo,sna,snc,snd,snn,snp,snx,sny,som,soq,sot,soy,spa,spl,spm,spp,sps,spy,sqi,srd,sri,srm,srn,srp,srq,ssd,ssg,ssw,ssx,stp,sua,sue,sun,sus,suz,svk,swa,swe,swg,swh,swp,sxb,szl,tac,tah,taj,tam,taq,tat,tav,taw,tbc,tbf,tbg,tbo,tbz,tca,tcs,tcz,tdt,tee,tel,ter,tet,tew,tfr,tgk,tgl,tgo,tgp,tha,tif,tim,tir,tiw,tiy,tke,tku,tlf,tmd,tna,tnc,tnk,tnn,tnp,toc,tod,tof,toj,ton,too,top,tos,tpa,tpi,tpt,tpz,trc,tsn,tso,tsw,ttc,tte,tuc,tue,tuf,tuk,tum,tuo,tur,tvk,twi,txq,txu,tyv,tzj,tzl,tzm,tzo,ubr,ubu,udu,uig,ukr,uli,ulk,umb,upv,ura,urb,urd,uri,urt,urw,usa,usp,uvh,uvl,uzb,uzn,vec,ven,vid,vie,viv,vmy,waj,wal,wap,war,wat,wbi,wbp,wed,wer,wim,wiu,wiv,wln,wmt,wmw,wnc,wnu,wol,wos,wrk,wro,wrs,wsk,wuu,wuv,xav,xbi,xed,xho,xla,xnn,xon,xsi,xtd,xtm,yaa,yad,yal,yap,yaq,yby,ycn,ydd,yid,yka,yle,yml,yon,yor,yrb,yre,yss,yue,yuj,yut,yuw,yva,zaa,zab,zac,zad,zai,zaj,zam,zao,zap,zar,zas,zat,zav,zaw,zca,zga,zho,zia,ziw,zlm,zos,zpc,zpl,zpm,zpo,zpq,zpu,zpv,zpz,zsm,zsr,ztq,zty,zul,zyp |
| [MTEB(Scandinavian, v1)](https://kennethenevoldsen.github.io/scandinavian-embedding-benchmark/) | Scandinavian | 28 | BitextMining: 2, Classification: 13, Retrieval: 7, Clustering: 6 | [Blog, Encyclopaedic, Fiction, Government, Legal, News, Non-fiction, Reviews, Social, Spoken, Web, Written] | dan,fao,isl,nno,nob,swe |
| [MTEB(cmn, v1)](https://github.com/FlagOpen/FlagEmbedding/tree/master/research/C_MTEB) | Chinese | 32 | Retrieval: 8, Reranking: 4, PairClassification: 2, Clustering: 4, STS: 7, Classification: 7 | [Academic, Entertainment, Financial, Government, Medical, Non-fiction, Written] | cmn |
Expand Down
29 changes: 15 additions & 14 deletions mteb/abstasks/TaskMetadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,16 +235,15 @@
METRIC_VALUE = Union[int, float, dict[str, Any]]


class PromptDict(TypedDict, total=False):
"""A dictionary containing the prompt used for the task.

Args:
query: The prompt used for the queries in the task.
passage: The prompt used for the passages in the task.
"""
PromptDict = TypedDict(
"PromptDict", {prompt_type.value: str for prompt_type in PromptType}, total=False
)
"""A dictionary containing the prompt used for the task.

query: str
passage: str
Args:
query: The prompt used for the queries in the task.
document: The prompt used for the passages in the task.
"""


class DescriptiveStatistics(TypedDict):
Expand All @@ -253,9 +252,6 @@ class DescriptiveStatistics(TypedDict):
pass


METRIC_VALUE = Union[int, float, dict[str, Any]]


logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -289,6 +285,7 @@ class TaskMetadata(BaseModel):
prompt: The prompt used for the task. Can be a string or a dictionary containing the query and passage prompts.
bibtex_citation: The BibTeX citation for the dataset. Should be an empty string if no citation is available.
adapted_from: Datasets adapted (translated, sampled from, etc.) from other datasets.
is_public: Whether the dataset is publicly available. If False (closed/private), a HuggingFace token is required to run the datasets.
"""

dataset: dict[str, Any]
Expand Down Expand Up @@ -316,35 +313,39 @@ class TaskMetadata(BaseModel):
sample_creation: SAMPLE_CREATION_METHOD | None = None
bibtex_citation: str | None = None
adapted_from: list[str] | None = None
is_public: bool = True

def validate_metadata(self) -> None:
self.dataset_path_is_specified(self.dataset)
self.dataset_revision_is_specified(self.dataset)
self.eval_langs_are_valid(self.eval_langs)

@field_validator("dataset")
@classmethod
def _check_dataset_path_is_specified(
cls, dataset: dict[str, Any]
) -> dict[str, Any]:
cls.dataset_path_is_specified(dataset)
return dataset

@field_validator("dataset")
@classmethod
def _check_dataset_revision_is_specified(
cls, dataset: dict[str, Any]
) -> dict[str, Any]:
cls.dataset_revision_is_specified(dataset)
return dataset

@field_validator("prompt")
@classmethod
def _check_prompt_is_valid(
cls, prompt: str | PromptDict | None
) -> str | PromptDict | None:
if isinstance(prompt, dict):
for key in prompt:
if key not in [e.value for e in PromptType]:
raise ValueError(
"The prompt dictionary should only contain the keys 'query' and 'passage'."
"The prompt dictionary should only contain the keys 'query' and 'document'."
)
return prompt

Expand Down Expand Up @@ -419,7 +420,7 @@ def is_filled(self) -> bool:
return all(
getattr(self, field_name) is not None
for field_name in self.model_fields
if field_name not in ["prompt", "adapted_from"]
if field_name not in ["prompt", "adapted_from", "is_public"]
)

@property
Expand Down
95 changes: 95 additions & 0 deletions mteb/benchmarks/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
from __future__ import annotations

from mteb.benchmarks.benchmark import Benchmark
from mteb.benchmarks.benchmarks.benchmarks import (
BEIR,
BEIR_NL,
BRIGHT,
BRIGHT_LONG,
BUILT_MTEB,
C_MTEB,
CHEMTEB,
CODE_RAG,
ENCODECHKA,
FA_MTEB,
JINA_VDR,
LONG_EMBED,
MIEB_ENG,
MIEB_IMG,
MIEB_LITE,
MIEB_MULTILINGUAL,
MTEB_DEU,
MTEB_EN,
MTEB_ENG_CLASSIC,
MTEB_EU,
MTEB_FRA,
MTEB_INDIC,
MTEB_JPN,
MTEB_KOR,
MTEB_MAIN_RU,
MTEB_MINERS_BITEXT_MINING,
MTEB_POL,
MTEB_RETRIEVAL_LAW,
MTEB_RETRIEVAL_MEDICAL,
MTEB_RETRIEVAL_WITH_INSTRUCTIONS,
NANOBEIR,
R2MED,
RU_SCI_BENCH,
SEB,
VIDORE,
VIDORE_V2,
VISUAL_DOCUMENT_RETRIEVAL,
VN_MTEB,
CoIR,
MTEB_code,
MTEB_multilingual_v1,
MTEB_multilingual_v2,
RAR_b,
)

__all__ = [
"Benchmark",
"MTEB_EN",
"MTEB_ENG_CLASSIC",
"MTEB_MAIN_RU",
"RU_SCI_BENCH",
"MTEB_RETRIEVAL_WITH_INSTRUCTIONS",
"MTEB_RETRIEVAL_LAW",
"MTEB_RETRIEVAL_MEDICAL",
"MTEB_MINERS_BITEXT_MINING",
"SEB",
"CoIR",
"RAR_b",
"MTEB_FRA",
"MTEB_DEU",
"MTEB_KOR",
"MTEB_POL",
"MTEB_code",
"MTEB_multilingual_v1",
"MTEB_multilingual_v2",
"MTEB_JPN",
"MTEB_INDIC",
"MTEB_EU",
"LONG_EMBED",
"BRIGHT",
"BRIGHT_LONG",
"CODE_RAG",
"BEIR",
"NANOBEIR",
"C_MTEB",
"FA_MTEB",
"CHEMTEB",
"BEIR_NL",
"MIEB_ENG",
"MIEB_MULTILINGUAL",
"MIEB_LITE",
"MIEB_IMG",
"BUILT_MTEB",
"ENCODECHKA",
"VIDORE",
"VIDORE_V2",
"VISUAL_DOCUMENT_RETRIEVAL",
"R2MED",
"VN_MTEB",
"JINA_VDR",
]
Original file line number Diff line number Diff line change
Expand Up @@ -855,7 +855,7 @@
],
)

MTEB_multilingual = Benchmark(
MTEB_multilingual_v1 = Benchmark(
name="MTEB(Multilingual, v1)",
display_name="Multilingual",
icon="https://github.com/DennisSuitters/LibreICONS/raw/2d2172d15e3c6ca03c018629d60050e4b99e5c55/svg-color/libre-gui-globe.svg",
Expand All @@ -869,7 +869,7 @@
)


MTEB_multilingual = Benchmark(
MTEB_multilingual_v2 = Benchmark(
name="MTEB(Multilingual, v2)",
display_name="Multilingual",
icon="https://github.com/DennisSuitters/LibreICONS/raw/2d2172d15e3c6ca03c018629d60050e4b99e5c55/svg-color/libre-gui-globe.svg",
Expand Down
Loading
Loading