-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of ocrd_tesserocr common functionality into core #268
Conversation
Codecov Report
@@ Coverage Diff @@
## master #268 +/- ##
==========================================
- Coverage 98.07% 92.99% -5.08%
==========================================
Files 30 30
Lines 1350 1485 +135
Branches 268 287 +19
==========================================
+ Hits 1324 1381 +57
- Misses 15 92 +77
- Partials 11 12 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I see a few possible improvements...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I see a few possible improvements...
Wait a minute! Where is When you add this, please be sure to apply OCR-D/ocrd_tesserocr#68 as well! |
Co-Authored-By: Robert Sachunsky <[email protected]>
It's there now. The rest of common.py has been turned into workspace methods. More tests would be wise but the ocrd_tesserocr test suite passes. |
Conflicts: CHANGELOG.md ocrd/requirements.txt ocrd_modelfactory/requirements.txt tests/test_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good already. I am glad you adopted the Workspace
method option. I will start a PR afterwards with test cases for the 3 new methods as well, if you like. (Based on assets, if that is not asking too much.)
(Please also see unresolved comments from last time.)
Yes, |
Conflicts: CHANGELOG.md ocrd/ocrd/workspace.py ocrd_utils/ocrd_utils/__init__.py tests/test_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Becoming less and less certain of this. @kba what do you think, is this too much?
Wrt. cropping vs. cutting (vs. segmenting?): Using the term cropping for localizing a page's border was a bad choice right from the start because it mixes the intellectual process of finding the borders and the physical process of separating the OCR-relevant from the irrelevant parts of the actual image. Using cutting does not improve things IMHO. The more I think about it, the more meaningful the use of the term (page-level) segmentation seems to me because this is what cropping right now does: It localizes the segment page on an image file. We could then use cropping as it is intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not let the terminological discussion slow us down.
This reverts commit f1772ce.
I second that, let's discuss in #289. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @kba, I forgot to finalize my last review! Please fix in the next PR...
Start implementing OCR-D/ocrd_tesserocr#49