-
Notifications
You must be signed in to change notification settings - Fork 359
Multilingual extractiveness #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multilingual extractiveness #956
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good ! only few nits and ready to merge :)
self.nlp = spacy.load(spacy_model) | ||
except OSError: | ||
logger.info("Downloading the spacy en_core_web_sm model\n(don't worry, this will only happen once)") | ||
logger.info("Downloading the spacy %s model\n(don't worry, this will only happen once)", spacy_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use f strings here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve updated it :-) Out of curiosity though: I used to rely on f-strings until a linter flagged it in another project because of this rule: https://docs.astral.sh/ruff/rules/logging-f-string/
Personally, I also prefer f-strings, but I was wondering if their use here is just a matter of preference in lighteval’s coding guidelines or if there’s another reason behind it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, we will add this rule in a later PR, this is just to keep the formatting coherent throughout the repo !
the end to end test fails after a vllm release, i will patch asap. In the meantime can you run |
I ran it and fixed a missing newline after the imports in |
* Added German, French, and Italian language support to Extractiveness metric * Added minimum version for spacy dependency * Added changes from code review * Added missing newline --------- Co-authored-by: Nathan Habib <[email protected]>
As part of a community task I have been collaborating on, I am planning to add multiple PRs to add functionality to lighteval that was required for our evaluations. This PR introduces support for German, French, and Italian when using the Extractiveness metric, and it adds dedicated metrics in these languages that can be used in evaluations.
Related issue: #955
I also had to update the spaCy dependency because I was running into dependency conflicts otherwise using the current pyproject toml file.