Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test for style inheritance #2

Open
wants to merge 23 commits into
base: ocrd-page-patch-inheritance
Choose a base branch
from

Conversation

kba
Copy link
Collaborator

@kba kba commented Dec 7, 2021

For some reason I could not push to the branch behind OCR-D#700.

I added a test to verify the functionality. I was a bit confused at first because I assumed inheritance was built into the getters as well but then I realized that it's only added to the build method, so only works after serialiazing and parsing again. One thing that bothers me slightly is that this means that the inheritance is made explicit after serializing the results again, i.e. you end up with a lot of duplicated primaryLanguage, TextStyle etc. - is that the intended effect?

@bertsky
Copy link
Owner

bertsky commented Dec 20, 2021

I was a bit confused at first because I assumed inheritance was built into the getters as well but then I realized that it's only added to the build method, so only works after serialiazing and parsing again. One thing that bothers me slightly is that this means that the inheritance is made explicit after serializing the results again, i.e. you end up with a lot of duplicated primaryLanguage, TextStyle etc. - is that the intended effect?

Oops, no – not at all. I was worried about that effect as well, but IIRC did not see it in my tests and thereby felt safe.

However, I don't even understand why you say this only works after serializing and reparsing. The build method gets called recursively after (initial) parsing. Sometimes the order of recursion does not fit our inheritance model, but we already compensate for that. So all the class members should be filled via inheritance after build. And that should be visible in the getters as well.

But indeed, now looking at the export routines, I fail to see how this is supposed to not make the expanded inheritance persistent...

@bertsky
Copy link
Owner

bertsky commented Dec 20, 2021

However, I don't even understand why you say this only works after serializing and reparsing. The build method gets called recursively after (initial) parsing. Sometimes the order of recursion does not fit our inheritance model, but we already compensate for that. So all the class members should be filled via inheritance after build. And that should be visible in the getters as well.

Oh, now I get it. What you meant is from the other side – what if I modify an object myself, how can I make the inheritance visible immediately? (So this actually applies to the setters – but since generateDS does not produce a strict setter API without direct access, and lazy programmers (=all programmers) therefore do not use setters, we cannot rely on that mechanism for patching.)

I have no simple answer for that. But I would argue that it's not a real issue.

But indeed, now looking at the export routines, I fail to see how this is supposed to not make the expanded inheritance persistent...

Do you have an answer to that perhaps?

@kba
Copy link
Collaborator Author

kba commented Dec 20, 2021

However, I don't even understand why you say this only works after serializing and reparsing. The build method gets called recursively after (initial) parsing.

True, but it does not get called when modifying the tree, e.g. adding elements to it, cf. https://github.com/bertsky/core/pull/2/files#diff-b4a75f642caae6a97596456f2ea64e086f91798216555f54e4b5e63b535a679aR358

@kba
Copy link
Collaborator Author

kba commented Dec 20, 2021

But indeed, now looking at the export routines, I fail to see how this is supposed to not make the expanded inheritance persistent...

Do you have an answer to that perhaps?

No, sorry, besides implementing the behavior completely dynamic, i.e. in the get_TextStyle() getters or overriding the primaryLanguage attributes with a @property wrapper (which is a lot of effort and while it saves the load-time performance hit of traversing ancestors, it adds it to runtime usage).

@bertsky
Copy link
Owner

bertsky commented Dec 20, 2021

No, sorry, besides implementing the behavior completely dynamic, i.e. in the get_TextStyle() getters or overriding the primaryLanguage attributes with a @property wrapper (which is a lot of effort and while it saves the load-time performance hit of traversing ancestors, it adds it to runtime usage).

Yes, indeed. In principle in our patch we could introduce new hidden attributes that allow differentiating implicit and explicit members, but I would rather not go there.

But we could attack the problem from the other side: (maximally) unexpanding inheritance during export. This could still deviate from the input (making explicit but redundant settings disappear). It would likely cost some performance as well.

Maybe we should make this behaviour optional with a kwarg, say parse(inFileName, silence=False, print_warnings=True, inheritance=None) with inheritance

  • None for the behaviour of the current master
  • 'expand' for the behaviour of this PR as it is (i.e. expansion during build)
  • 'expand-unexpand' for the additional maximal unexpansion during export

@kba
Copy link
Collaborator Author

kba commented Dec 21, 2021

But we could attack the problem from the other side: (maximally) unexpanding inheritance during export. This could still deviate from the input (making explicit but redundant settings disappear). It would likely cost some performance as well.

That's a great idea and it not only adresses the issue of accessing style inheritance but also improves the consistency of existing PAGE-XML. We obviously must document this well but I think the expand-unexpand behavior should even be the default with none and expand as options when the feature is not required by the processor at hand or the performance loss is substantial (newspaper pages or other many-regioned pages).

@bertsky
Copy link
Owner

bertsky commented Dec 21, 2021

Ok, if you think so. But let's hear some other perspectives in the Tech Call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants