-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better handling of Functions in evaluation #94
Comments
Note that unlike punctuation, the decision of whether a word should be Function or not is nontrivial. So if we were to simply ignore all Function units like we do Punctuation units, each Function vs. non-Function difference between prediction and reference could result in mismatched spans at several levels. Hence, better when computing precision to ignore Function units in the reference only that would prevent ancestor units from matching, vice versa for recall. (The algorithm would need to be worked out: e.g. suppose we have [A [F x] [D [F y] [E [F z] [C c]] ] ] A and D have identical spans modulo F's, as do E and C. So this could effectively mean there are more unary edges if the other analysis chooses to put x, y, and z elsewhere. Would that make the scorer too lenient under the current policy that any category match is sufficient to count the span as correct? Also, when trying to match large units with many F descendants, do we have a combinatorial search space to decide whether to include each F? Hopefully not a problem in practice as units tend to be nested, so a failure of a unit to match will imply that its parent units with more non-F descendants will not match, assuming proper nesting on both sides.) Function units themselves would not count toward the score (i.e., they are excluded from the list of matches even if present in both analyses). |
I take that back. On further reflection, if PRIMARY edges strictly form a tree (and no token can belong to multiple units), then it doesn't matter whether the tree is projective: the matching can be done bottom-up, once for precision and again for recall. I suppose there can be a chart formed by reordering the tokens so the graph being scored is projective. If a unit matches, that can be taking into account when checking whether its parent unit matches (so work determining that certain F's SHOULD be included doesn't have to be repeated). |
Suppose we had SYS: [C [F the] [P party]]
Now consider: SYS: [C [E the] [P party]]
|
Functions are currently moved to the root if they common in prediction and gold (see #91 (comment)), but a better handling would be "soft" matching of yields to allow excluding Functions in a non-symmetric way: for calculating precision, we should allow omitting Functions from the gold; and for recall, from the prediction.
The text was updated successfully, but these errors were encountered: