-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate NLTK datapackage punkt_tab
download
#803
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #803 +/- ##
==========================================
- Coverage 99.65% 99.63% -0.02%
==========================================
Files 93 93
Lines 6889 6909 +20
==========================================
+ Hits 6865 6884 +19
- Misses 24 25 +1 ☔ View full report in Codecov by Sentry. |
@CodiumAI-Agent /review |
PR Reviewer Guide 🔍(Review updated until commit f01d351)
|
Persistent review updated to latest commit c047aea |
Persistent review updated to latest commit f01d351 |
|
One line is missing test coverage, but that line "should never be run" and the involved logic is very simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Based on the update-dependencies-v1.2 branch (PR #796) because it upgrades NLTK to a version that needs the
punkt_tab
package (notpunkt
anymore): nltk/nltk#3283)The download is performed automatically when instantiating an Analyzer object, if the
punkt_tab
data is not found. TheREADME.md
has been adjusted for this (i.e. removed the instructions to runpython -m nltk.downloader punkt_tab
). NB: Adjust Wiki pages too on release.In Dockerfile this data download step is retained, so when using the Dockerimage everything is ready without download needs (otherwise the data would be re-downloaded every time Annif is used in a new container, because the NLTK data storage is not mounted to a volume).
Also in the CI/CD pipeline there is still this explicit download step.
BTW, now when modifying this, it would be a good time to consider changing the location of the NLTK data storage, which is discussed here. However, the current
~/nltk_data/
is not bad, I think.