-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workspace_validator: allow skipping pcgtsid check #1066
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to have this check optional since we cannot always enforce it (though we should, as you have started, ensure that OCR-D processors handle this correctly).
Minor note: Perhaps just pcgtsid
instead of mets_fileid_page_pcgtsid
which is easier to type?
It's not clear what is meant by just |
- let `download=True` only download URLs to temporary location on demand (like processors do), not convert to local refs (like `workspace find --download` would) - instead of adding notices to the validation report for remote files not downloaded, just add a warning to the logger
@kba in 6991166 I also changed the behaviour of |
But actually IMHO the workspace validator should be rewritten even more: It does not make sense to iterate over the same workspace for each PAGE XML test independently. Instead, |
see #1070 – to be discussed |
OK, let's merge this now and continue the discussion on improving the validator (which has not been substantially updated in too long a time) in #1070 . |
I thought we do it the other way round, hence #1070 was against this PR branch, not master. Since AFAIK one cannot re-target PRs, I'll close and open a new one against master. |
Oh, I did not realize, sry about that. |
This is necessary as some of our processors do not use
set_pcGtsId(file.ID)
, IINM currently:ocrd-keraslm-rate(was already correct)ocrd-fileformat-transform(hard to enforce in submodule transform, only for PAGE output)ocrd-im6convert(does not apply: only image output)