Skip to content

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Jun 11, 2025

Closes https://github.com/embeddings-benchmark/results/issues/196

I've created CI action that would create tables for PR automatically. If there won't be any results for multilingual-e5 or gemini-embedding-001 then results of only new model will be added.

With each new commit CI will update comment with results.

Checklist

  • My model has a model sheet, report or similar
  • My model has a reference implementation in mteb/models/ this can be as an API. Instruction on how to add a model can be found here
    • No, but there is an existing PR ___
  • The results submitted is obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not on the evaluation dataset including training splits. If I have I have disclosed it clearly.

@Samoed Samoed marked this pull request as draft June 11, 2025 20:07
@github-actions
Copy link

github-actions bot commented Jun 11, 2025

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: ai-forever/FRIDA
Tasks: MassiveIntentClassification

Results for ai-forever/FRIDA

task_name ai-forever/FRIDA google/gemini-embedding-001 intfloat/multilingual-e5-large Max result
MassiveIntentClassification 0.79 0.82 0.6 0.85
Average 0.79 0.82 0.6 0.85

@Samoed Samoed requested a review from KennethEnevoldsen June 11, 2025 21:01
@Samoed Samoed marked this pull request as ready for review June 11, 2025 21:11
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

A few things which we could add, but is not required to merge this (probably out of scope):

  • Would love to also include the "max" score the each task
  • Would love to be able trigger this command using something like @bot result --models intfloat/e5-large-v2

@Samoed
Copy link
Member Author

Samoed commented Jun 12, 2025

Would love to be able trigger this command using something like @bot result --models intfloat/e5-large-v2

To use it like a bot, we need to host it somewhere. I can host it, but I'm not sure if that's the best option. We could make it trigger on each comment and check if there’s a command to run, but I’m not sure if that’s a good idea.

Another solution could be to parametrize workflow dispatch (manually trigger) to add models here. I will try this approach

@Samoed
Copy link
Member Author

Samoed commented Jun 13, 2025

I've added support for max score and comparison with different models through workflow_dispatch, but I can't test it yet because it hasn't been merged

@Samoed Samoed requested a review from KennethEnevoldsen June 13, 2025 11:52
@Samoed
Copy link
Member Author

Samoed commented Jun 15, 2025

@KennethEnevoldsen Can you review this PR? Actual work of the script in this comment #220 (comment)

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Feel free to merge

To use it like a bot, we need to host it somewhere. I can host it, but I'm not sure if that's the best option. We could make it trigger on each comment and check if there’s a command to run, but I’m not sure if that’s a good idea.

Another solution could be to parametrize workflow dispatch (manually trigger) to add models here. I will try this approach

I think for now it is perfectly fine to just have it as a default comment, and then we can run any additional stuff ourselves.

@Samoed Samoed enabled auto-merge (squash) June 15, 2025 18:55
@Samoed Samoed merged commit 0af4146 into main Jun 15, 2025
3 checks passed
@Samoed Samoed deleted the automate_script branch December 24, 2025 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate table with submited results

2 participants