Conversation
|
@HSILA can you create table that compare your results with results of benchmark? |
Hi, hope you are doing fine. As I mentioned earlier here, a direct comparison with the paper is not possible because the task-model scores are not presented in the paper. The paper reports an average score per category, but the combination of tasks has changed in my PR, so we cannot directly compare per-category averages. However, one approach we can take is to compare the tasks that are present both in my PR and in the ChemTEB results. I have shared the local JSON results that I have, allowing us to compare the shared tasks and evaluate how the performance has changed on average. To ensure that the JSON files I shared in chemteb-results correspond to the same results used to produce Table 2, you can refer to table2.ipynb and reproduce it for verification. The mteb.ipynb notebook compares the main score for shared tasks (using an average score for tasks that were merged as subsets of a bigger task in MTEB) and reports the difference. The observed difference is 0.0045 overall (average across all the models and tasks), and it ranges from approximately 0 to 0.02 for most tasks. Notably, the Other observed changes, particularly in Classification and Clustering tasks, can be attributed to updates that complemented the That said, we can always revert all revisions to match those used in the paper. However, the current revisions in the PR provide more detailed information, such as the |
Adding ChemTEB results, as requested in this PR.
Checklist
make test.make pre-push.Adding a model checklist
mteb/models/directory. Instruction to add a model can be found here in the following PR ____