Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding multiple text content elements in FLAT #188

Closed
pirolen opened this issue Sep 22, 2023 · 10 comments
Closed

Adding multiple text content elements in FLAT #188

pirolen opened this issue Sep 22, 2023 · 10 comments
Assignees
Labels

Comments

@pirolen
Copy link

pirolen commented Sep 22, 2023

I wonder if FLAT supports adding multiple text content elements https://folia.readthedocs.io/en/latest/text_annotation.html#text-annotation].
I would have a use case for it: there are several versions of a historical text: source text, a later version of the text, and their normalized orthography, as well as OCR.

Bringing together all these layers (on the token level) programmatically is not truly possible.
(I tried a bit but the file got big and FLAT had a gateway error).

The idea would be that users can enter the words for the different textclass layers via FLAT.

So I prepared thus a oneliner based on the example in the documentation (<t>Hello. This is a sentence. Bye!</t> etc.), please see attached. It is not yet tokenized, to keep it simple, and also I thought that in the use case the users could enter bigger chunks of texts (e.g. sentences).

This toy file renders OK in FLAT, and I can change between the three different text contents using the Selector.

  1. But if I try to add a correction to the content in one of the layers (which is thus untokenized), all other layers seem to get modified too (screenshot). And selecting the view of the other layers is not working well anymore (either shows nothing or shows the layer that was corrected). Maybe adding a correction in this way is not allowed?
Screen Shot 2023-09-22 at 22 49 07
  1. I did not seem to get how to make FLAT accept a fully newly entered text, tied to an existing one -- either as correction or as new text content. Please advise if this would be possible (I tried the string-annotation, but entering a word throws and error, cf. screenshot). I am also somewhat unsure how to declare the annotation set that would enable this action.
Screen Shot 2023-09-22 at 23 18 08
  1. Furthermore, FLAT is not able to revert to any of the previous document versions, it throws an error (screenshot).
Screen Shot 2023-09-22 at 22 49 58

Many thanks if you have the chance for looking into this. I use the docker FLAT.

bbb2.folia.xml.txt

@pirolen
Copy link
Author

pirolen commented Sep 23, 2023

I did a bit more exploration, trying to see what happens if I have tokenized text.
I run ucto on the above xml file, and I could tokenize one layer of text content, but not further ones ("ucto:Difficult to tokenize 'bbb2.ocr-ucto.folia.xml' again, already processed by ucto before!").
So perhaps this use case is not viable?

Nevertheless, I attach a screenshot of trying to add new text content, e.g. to the token 'Sentence'. What is one supposed to enter in the dialog box? E.g. add a feature, where class is the text itself, plus a value for the feature subset, which needs to be declared in the set definition?

Screen Shot 2023-09-23 at 11 56 40

@proycon proycon self-assigned this Sep 26, 2023
@proycon
Copy link
Owner

proycon commented Sep 26, 2023

I wonder if FLAT supports adding multiple text content elements

Good question. I think it's a bit of a grey area where things quickly become
unsupported, and as you found out, things quickly become buggy. You're touching
or even crossing the limits of what FLAT is currently capable of,
unfortunately.

  1. But if I try to add a correction to the content in one of the layers
    (which is thus untokenized), all other layers seem to get modified too
    (screenshot). And selecting the view of the other layers is not working
    well anymore (either shows nothing or shows the layer that was corrected).
    Maybe adding a correction in this way is not allowed?

I think corrections only work on the token level, applying them on higher
levels has never really been properly implemented in FLAT or even in FoLiA
itself (even though it would technically be allowed).

So edits on sentences should always be direct (D), or (N) if you want to add
text with a new text class, but definitely not corrections (C). However, there
indeed seems to be a bug here, changing the text content for one layer changes
them all.

  1. I did not seem to get how to make FLAT accept a fully newly entered text,
    tied to an existing one -- either as correction or as new text content.

This too seems a clear bug in FLAT, I reproduced it: Adding a new "Text" only shows a field to
select the text class (from the default set annotation), but the expected field for
the actual text never shows.

Nevertheless, I attach a screenshot of trying to add new text content, e.g.
to the token 'Sentence'. What is one supposed to enter in the dialog box?
E.g. add a feature, where class is the text itself, plus a value for the
feature subset, which needs to be declared in the set definition?

No, there should have been a text field. Adding a Feature like in the
screenshot is definitely not what you want here. But I can't blame you for
trying since the text field is missing and things are confusing enough ;)

(I tried the string-annotation, but
entering a word throws and error, cf. screenshot).

Don't use string-annotation no, support for adding string annotation
has never been implemented, and it's not what you need here anyway.

I am also somewhat unsure how to declare the annotation set that would enable this action.

You'd have to load a document with <text-annotation set=".."> explicitly
set to your custom set. I don't think the interface allows adding a text
annotation set.

As to your over-arching question "Please advise if this would be possible".
Currently things seem too broken for this to work, I think if both bugs were
solved it would be possible to add/edit text content with multiple classes, but
with the following constraints:

  • only in direct edit mode, no correction mode
  • editing sentences will only work when there is no underlying tokenisation (so it won't work on the ucto files),
    otherwise it introduces text consistency problems that FLAT can't handle. The reverse also holds, editing words would only work if there is not text on higher levels.
  • editing text content does not take into account any of the hyphenated breaks or any other markup that may be in it (it would get stripped away entirely). Markup elements can only be visualised in FLAT (to an extent, not edited).

So I kind of wonder if fixing these bugs will bring FLAT into a state that
makes it useful enough for your use case. If not, then it may not be worth the
effort anymore to try to fix them. What do you think?

I should add that the future of FLAT is very uncertain at this point, it's a
fairly old and sufficiently complex codebase, and there are only a few users.
FLAT is maintained and funded as part of the CLARIAH project (WP3), but that
entire project is coming to an end this year, which will most likely put FLAT
in End-of-Life/Deprecated status unless there's interest in a revival from
another project (but I myself am even a bit skeptical whether that's still worth it).

@proycon proycon added the bug label Sep 26, 2023
@pirolen
Copy link
Author

pirolen commented Sep 26, 2023

Ah, I see, very sad to hear that FLAT may become deprecated, since it is such a great support for enriching FoLiA documents. Do/will people in your projects use another annotation environment?

Depending on your capacities, if some of the things are debuggable, we would be happy to use FLAT further.
We could also try if a developer here could contribute to the software. What do you think?

@proycon
Copy link
Owner

proycon commented Sep 26, 2023

Depending on your capacities, if some of the things are debuggable, we would be happy to use FLAT further.

I can definitely look into the two bugs you found if that's enough for your use-case, but I do wonder if the constraints I mentioned are not too limiting?

We could also try if a developer here could contribute to the software. What do you think?

Contributions are of course always welcome, but the code-base isn't the most accessible I'm afraid, so it will be difficult.
Ideally, the front-end code needs proper rewrite (which I already suggested in #135 in 2018, the code is almost 10 years old now), but that's a huge project and not going to happen anymore.

Do/will people in your projects use another annotation environment?

My own preference has shifted to more lightweight solutions, whereas FLAT is a very comprehensive environment that tries to accommodate most of FoLiA (and FoLiA itself is quite comprehensive). This was by design and FLAT's greatest strength, but also its greatest weakness probably as things get complex quickly (as we notice in this issue) and FLAT is not an easily reusable component in other contexts, it's by definition married to FoLiA.

In the field (my view is limited though), I've seen simple solutions built on libraries like Recogito-JS, usually specific for a certain annotation task in a project. There's https://github.com/zenml-io/awesome-open-data-annotation which tries to keep a nice list of manual annotation tools (FLAT's in there too).

@pirolen
Copy link
Author

pirolen commented Sep 26, 2023

Thank you very much. I am going to restrict the use case to FLAT's capabilities then, after debugging, and am going ask a developer here to look into the frontend upgrade mentioned in the related issue.

@pirolen
Copy link
Author

pirolen commented Sep 26, 2023

P.S. Is FoLiA and its tools going to be maintained after CLARIAH ends?

@proycon
Copy link
Owner

proycon commented Sep 26, 2023

FoLiA is indeed funded from CLARIAH as well, so the same problem applies. I'm trying to at least ensure some limited funding for continued maintenance & support (excluding large feature developments) of FoLiA, Frog, ucto, to ensure basic continuity, but all that is unclear still. We're happy to also have @kosloot actively involved in his free retirement time, that of course also helps a lot! But continuity of research software needs proper attention and funding from projects or institutes in order to be really sustainable, and that's often difficult unfortunately.

Btw, I'm also been working on other annotation solutions (STAM ) where transition from FoLiA is explicitly included (but that too is in the scope of CLARIAH).

@pirolen
Copy link
Author

pirolen commented Sep 26, 2023

OMG... I hope that inland funding continues for the awesome infrastucture of you guys, otherwise we could come up with an international solution? :-)

@proycon
Copy link
Owner

proycon commented Feb 7, 2024

Some internal notekeeping on the debugging for this issue:

Bug 1 is caused by the FQL query being too broad:

USE flat/bbb2 PROCESSOR name "flat" type manual IN $FLAT_PROCESSOR IN $FOLIADOCSERVE_PROCESSOR EDIT t OF https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl WITH text "Hell0 Th15 iz a sentence, Bye1" datetime now confidence NONE textclass "ocroutput" FOR ID example.p.1 FORMAT flat RETURN target

The correct query (tested to work) needs a WHERE clause on textclass:

USE flat/bbb2 PROCESSOR name "flat" type manual IN $FLAT_PROCESSOR IN $FOLIADOCSERVE_PROCESSOR EDIT t OF https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl WHERE textclass = "ocroutput" WITH text "Hell0 Th15 iz a sentence, Bye1" datetime now confidence NONE textclass "ocroutput" FOR ID example.p.1 FORMAT flat RETURN target

Working on a fix now...

proycon added a commit that referenced this issue Feb 7, 2024
This is bug 1 of #188
This edited all layers regardless of textclass, now it correctly applies
only to the selected one.
proycon added a commit that referenced this issue Feb 7, 2024
This fixes the second bug of this issue.
@proycon
Copy link
Owner

proycon commented Feb 7, 2024

The two bugs should now be resolved in flat v0.11.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants