-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-LRB- and -RRB- in Form and Lemma #1
Comments
I am happy to create a pull request with this change, if you agree. |
This has been discussed before, although I don't remember whether it was ever formally raised as an issue. Personally, I am all in favor of getting rid of the old -LRB- and -RRB- in the interest of uniformity across languages, but I remember that the English were a little reluctant because most (?) English tokenizers actually output -LRB- and -RRB- rather than ( and ). Perhaps it is time to settle this once and for all. |
I think this is the 3rd time this issue is raised. If this should be once and for all, then I cannot miss the opportunity to say 👍 for getting rid of those. |
The previous issue about this was opened at the docs repository, here: UniversalDependencies/docs#148 There seems to be consensus that the escaping should be dropped, it just has not happened yet. Perhaps the only question here is whether the UD_English team are prepared to receive the update via a pull request. If they have done other changes in the meantime, which have not been pushed, it may be difficult to merge. @manning @ngiordani @tdozat could you please comment on this. |
To move things forward, I created a pull request. I am happy to rebase it at any time, so if/when we are ready to merge it, I will update it. |
Thanks for the pull request, @foxik, but we are actually not directly editing the files that we are pushing to the public GitHub repository, so this would get overwritten again with the next release. We'll discuss this in our meeting on Friday but I think we should be able to make this change for the next release. |
@sebschu Thanks for clarifying, I will close the pull request. |
Agreed. We will change this to use ( and ). I think we can do this for the version 1.2 release. |
I understand. This is a trivial change, so performing it by yourselves would be probably easier even if you used this repository as a primary source. Thanks for doing this in the 1.2 release. |
Notwithstanding that there are still validation errors to be addressed, this is fixed in our version 1.2 release candidate. |
* Fixes #88 * Implemented using the following DepEdit script: ``` #no special adverbial status for WH adverb subordinations lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/mark/&head=/(.*)/&pos=/ADV/ none #1:pos=SCONJ lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/advmod/&head=/(.*)/ none #1:func=mark;#1:pos=SCONJ;#1:head2=$2:mark #exception for WH adverbs in questions, identified by question mark and not being an advcl func!=/advcl/;lemma=/^(when|how|where|while|why|whenever|wherever)$/&pos=/SCONJ/&func=/mark/&head=/(.*)/;text=/^(\?+)!?$/ #1>#2;#2.*#3 #2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod #exception for 'why not' func=/root/;lemma=/why/&func=/mark/&head=/(.*)/;lemma=/not/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$1:advmod #exception for do support func=/root/;lemma=/^(why|how|when|where)$/&func=/mark/&head=/(.*)/;lemma=/do/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod ```
The -LRB- and -RRB- are currently used instead of "(" and ")" in Form, and -lrb- and -rrb- instead of "(" and ")" in Lemma. I think plain "(" and ")" shoulud be used. In UD 1.1, English is the only language which does not use plain "(" and ")" in forms.
I understand that -LRB- and -RRB- (and in past -LCB-, -RCB-, -LSB-, -RSB-) are used in Penn Treebank formats, because "(" and ")" have special meaning in that format. But since "(" and ")" are not special in CoNLL-U, I believe using "(" and ")" instead of -LRB- and -RRB- would be better.
The text was updated successfully, but these errors were encountered: