You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Usually, I don't remember the exact name of the schema to validate aggainst, e.g.
ocr-validate page-2019-07-15 input.xml
is hard to remember. However, on the other side, it is usually easy to detect the exact version from inspecting the first lines with the stylesheet definition.
Thus, I suggest to simplify the validation, e.g. such that we can also use
ocr-validate page input.xml
which will then check whether the input file is valid against the stylesheet given at the beginning. Even
ocr-validate input.xml
could work for XML files and maybe some simply guessing for the others (html -> hocr, JSON -> GCV).
I am not yet sure, whether it is afterwards still useful to have the option to specify the exact stylesheet instead of simply any PAGE version, i.e. to make these simplifications additional rather than replacing the old ones with them.
The text was updated successfully, but these errors were encountered:
whether it is afterwards still useful to have the option to specify the exact stylesheet instead of simply any PAGE version
I would leave that option and optionally automate. Note that such automation requires reading and parsing the XML twice, once for the schema detection and once for the actual validation. For bulk processing this should be avoidable.
Usually, I don't remember the exact name of the schema to validate aggainst, e.g.
is hard to remember. However, on the other side, it is usually easy to detect the exact version from inspecting the first lines with the stylesheet definition.
Thus, I suggest to simplify the validation, e.g. such that we can also use
which will then check whether the input file is valid against the stylesheet given at the beginning. Even
could work for XML files and maybe some simply guessing for the others (html -> hocr, JSON -> GCV).
I am not yet sure, whether it is afterwards still useful to have the option to specify the exact stylesheet instead of simply any PAGE version, i.e. to make these simplifications additional rather than replacing the old ones with them.
The text was updated successfully, but these errors were encountered: