Skip to content

Add ChemTEB results#89

Merged
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
HSILA:main
Jan 12, 2025
Merged

Add ChemTEB results#89
Samoed merged 3 commits intoembeddings-benchmark:mainfrom
HSILA:main

Conversation

@HSILA
Copy link
Contributor

@HSILA HSILA commented Jan 8, 2025

Adding ChemTEB results, as requested in this PR.

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the results files checker make pre-push.

Adding a model checklist

  • I have added model implementation to mteb/models/ directory. Instruction to add a model can be found here in the following PR ____

@Samoed
Copy link
Member

Samoed commented Jan 11, 2025

@HSILA can you create table that compare your results with results of benchmark?

@HSILA
Copy link
Contributor Author

HSILA commented Jan 11, 2025

@HSILA can you create table that compare your results with results of benchmark?

Hi, hope you are doing fine.

As I mentioned earlier here, a direct comparison with the paper is not possible because the task-model scores are not presented in the paper. The paper reports an average score per category, but the combination of tasks has changed in my PR, so we cannot directly compare per-category averages.

However, one approach we can take is to compare the tasks that are present both in my PR and in the ChemTEB results. I have shared the local JSON results that I have, allowing us to compare the shared tasks and evaluate how the performance has changed on average.

To ensure that the JSON files I shared in chemteb-results correspond to the same results used to produce Table 2, you can refer to table2.ipynb and reproduce it for verification.

The mteb.ipynb notebook compares the main score for shared tasks (using an average score for tasks that were merged as subsets of a bigger task in MTEB) and reports the difference.

The observed difference is 0.0045 overall (average across all the models and tasks), and it ranges from approximately 0 to 0.02 for most tasks. Notably, the PubChemWikiParagraphsPC task shows a significant difference because it was updated later. This update involved masking exact chemical compound names in each text pair to make the problem more challenging.

Other observed changes, particularly in Classification and Clustering tasks, can be attributed to updates that complemented the label column (which was not sufficiently descriptive) with a label_text column. Additionally, these tasks may have undergone reordering, potentially affecting train-test splits.

That said, we can always revert all revisions to match those used in the paper. However, the current revisions in the PR provide more detailed information, such as the label_text column and an updated README.

@Samoed Samoed merged commit 4f6a9fc into embeddings-benchmark:main Jan 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants