init automate script #220

Samoed · 2025-06-11T20:07:04Z

Closes https://github.com/embeddings-benchmark/results/issues/196

I've created CI action that would create tables for PR automatically. If there won't be any results for multilingual-e5 or gemini-embedding-001 then results of only new model will be added.

With each new commit CI will update comment with results.

Checklist

My model has a model sheet, report or similar
My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
- No, but there is an existing PR ___
The results submitted is obtained using the reference implementation
My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

github-actions · 2025-06-11T20:18:26Z

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: ai-forever/FRIDA
Tasks: MassiveIntentClassification

Results for `ai-forever/FRIDA`

task_name	ai-forever/FRIDA	google/gemini-embedding-001	intfloat/multilingual-e5-large	Max result
MassiveIntentClassification	0.79	0.82	0.6	0.85
Average	0.79	0.82	0.6	0.85

KennethEnevoldsen

Looks good.

A few things which we could add, but is not required to merge this (probably out of scope):

Would love to also include the "max" score the each task
Would love to be able trigger this command using something like @bot result --models intfloat/e5-large-v2

scripts/pr_results_comment.py

Samoed · 2025-06-12T21:29:44Z

Would love to be able trigger this command using something like @bot result --models intfloat/e5-large-v2

To use it like a bot, we need to host it somewhere. I can host it, but I'm not sure if that's the best option. We could make it trigger on each comment and check if there’s a command to run, but I’m not sure if that’s a good idea.

Another solution could be to parametrize workflow dispatch (manually trigger) to add models here. I will try this approach

Samoed · 2025-06-13T11:52:40Z

I've added support for max score and comparison with different models through workflow_dispatch, but I can't test it yet because it hasn't been merged

Samoed · 2025-06-15T17:49:09Z

@KennethEnevoldsen Can you review this PR? Actual work of the script in this comment #220 (comment)

KennethEnevoldsen

Looks good! Feel free to merge

To use it like a bot, we need to host it somewhere. I can host it, but I'm not sure if that's the best option. We could make it trigger on each comment and check if there’s a command to run, but I’m not sure if that’s a good idea.

Another solution could be to parametrize workflow dispatch (manually trigger) to add models here. I will try this approach

I think for now it is perfectly fine to just have it as a default comment, and then we can run any additional stuff ourselves.

.github/workflows/model-results-comparison.yaml

init automate script

7c75155

Samoed marked this pull request as draft June 11, 2025 20:07

Samoed added 2 commits June 11, 2025 23:08

bump versions

f7a94a9

fix script name

fd574a9

Samoed added 6 commits June 11, 2025 23:19

add tabulate

1065f7b

add resutls for test model

eec89fb

handle no result on task

42e02b1

fix function

9b4025d

fix function

e419cc8

remove testuser

6d41b5f

Samoed requested a review from KennethEnevoldsen June 11, 2025 21:01

fix script help

15d60e9

Samoed marked this pull request as ready for review June 11, 2025 21:11

Samoed added 2 commits June 12, 2025 00:12

try to run only one model

cb95fd2

install from sources

e74dde0

KennethEnevoldsen reviewed Jun 12, 2025

View reviewed changes

Samoed added 7 commits June 13, 2025 14:20

update script

d3d1b9b

format

a28be8f

fix typo

6767970

fetch main

c66edbd

fetch main in script

a7ed069

remove revision check

dc2d787

fix reference models arg

c67ecc5

Samoed requested a review from KennethEnevoldsen June 13, 2025 11:52

KennethEnevoldsen approved these changes Jun 15, 2025

View reviewed changes

.github/workflows/model-results-comparison.yaml Outdated Show resolved Hide resolved

bump python version

f40d643

Samoed enabled auto-merge (squash) June 15, 2025 18:55

Samoed merged commit 0af4146 into main Jun 15, 2025
3 checks passed

Samoed deleted the automate_script branch December 24, 2025 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init automate script #220

init automate script #220

Uh oh!

Samoed commented Jun 11, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 11, 2025 •

edited

Loading

Uh oh!

KennethEnevoldsen left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Jun 12, 2025 •

edited

Loading

Uh oh!

Samoed commented Jun 13, 2025

Uh oh!

Samoed commented Jun 15, 2025

Uh oh!

KennethEnevoldsen left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

init automate script #220

init automate script #220

Uh oh!

Conversation

Samoed commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

github-actions bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model Results Comparison

Results for ai-forever/FRIDA

Uh oh!

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Samoed commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed commented Jun 13, 2025

Uh oh!

Samoed commented Jun 15, 2025

Uh oh!

KennethEnevoldsen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Samoed commented Jun 11, 2025 •

edited

Loading

github-actions bot commented Jun 11, 2025 •

edited

Loading

Results for `ai-forever/FRIDA`

Samoed commented Jun 12, 2025 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading