-
Notifications
You must be signed in to change notification settings - Fork 32
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cropping vs. cutting vs. segmenting #289
Comments
This sounds very convincing to me. Except for one problem: (correct me if I am wrong, but) page segmentation usually refers to finding regions, not the border. It would make more sense to call that region segmentation, just as line segmentation creates lines, (so page segmentation would indeed be free for what we used to call cropping), but I never heard that. |
That's actually what we (@cneud and @kba and me) agreed on: To prefix segmentation with the result and not with the level of operation (i.e. segment image into X). You are absolutely right that page segmentation usually refers to segmentation of the page. But I prefer principle and sound solutions over traditions. 😁 |
It is definitely a stumbling point for newcomers and users, but I am skeptical whether researchers can be convinced easily to adopt that change terminology. (In the least, page segmentation would have to be disambiguated verbosely for a while.) Another established term is page frame detection. This already distinguishes itself from the physical operation (of cropping / cutting). So it might be a compromise (and smaller deviation from tradition) to use cropping only as an image operation (not a workflow step) in OCR-D, and consistently use page frame detection for the process of finding |
It is a pity that the PAGE element is called |
You mean instead of segmentation?
But that (new) principle could still not be applied for page segmentation (in the new sense): |
Yeah! That's why I propose a completely new wording:
I.e. foregoing the new principle. |
I see. But the last 2 steps (region and line segmentation) do not actually detect any borders (i.e. outer limits) of regions and lines, they rather define those very regions and lines. IMHO we have no good reason to drop the term segmentation itself at this point. Also, we should probably not concern ourself much with the names of components or processors here – as these need to accomodate other considerations (like using imperative verb forms instead of abstract nouns, e.g. That being said, I don't find the existing naming scheme of ocrd_tesserocr all that bad – although I wouldn't mind a slight change like so:
|
@bertsky Is there still something to do from this discussion? |
Hard to summarise, even harder to reach an agreement at this point. We have:
We need to accomodate:
I'm afraid we cannot re-invent the wheel here, or just ignore existing terminology in the academic literature or in the field. I suggest sticking to |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
In the docstrings,
cropping
currently refers to tasks that could be better described as segmenting (finding regions) or cutting (doing the actual image manipulation).This came up in #268 but finding the right terminology should not prevent a merge.
We should also extend the glossary.
Here's the pertinent comments on the terms:
@bertsky:
@wrznr:
The text was updated successfully, but these errors were encountered: