You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An issue in the context of FLAT rendering ucto output, and document/paragraph view.
In screenshot 1, observe the extra whitespace around punctuation marks (docx converted by piereling, then ucto in lama).
In screenshot 2 the same, but there is also missing whitespace between tokens (abbyy xml converted by FoLiA-abby, then ucto in lama).
This is a bug in FLAT indeed: ucto puts a space="no" attribute in the FoLiA if there is to be no space between the word and the next one (with respect to the untokenised original), but for some reason something is wrong in FLAT and it interprets it the other way around, removing the space with the previous word instead of the next one. I wonder if this emerges in all scenarios as I hadn't seen it before. Will investigate and fix!
@pirolen: The first document you provided does not exhibit the same problem as the second by the way, in the first document these spaces are actually encoded that way in the FoLiA document. This may be caused by either pandoc (docx->rst) or rst2folia.
An issue in the context of FLAT rendering ucto output, and document/paragraph view.
In screenshot 1, observe the extra whitespace around punctuation marks (docx converted by piereling, then ucto in lama).
In screenshot 2 the same, but there is also missing whitespace between tokens (abbyy xml converted by FoLiA-abby, then ucto in lama).
I also attach the corresponding XML files.
pers_verz_test.ucto.folia.xml.txt
b1_2_kap4_pp298-1.png.ucto.folia.xml.txt
The text was updated successfully, but these errors were encountered: