-LRB- and -RRB- in Form and Lemma #1

foxik · 2015-10-13T12:05:05Z

The -LRB- and -RRB- are currently used instead of "(" and ")" in Form, and -lrb- and -rrb- instead of "(" and ")" in Lemma. I think plain "(" and ")" shoulud be used. In UD 1.1, English is the only language which does not use plain "(" and ")" in forms.

I understand that -LRB- and -RRB- (and in past -LCB-, -RCB-, -LSB-, -RSB-) are used in Penn Treebank formats, because "(" and ")" have special meaning in that format. But since "(" and ")" are not special in CoNLL-U, I believe using "(" and ")" instead of -LRB- and -RRB- would be better.

foxik · 2015-10-13T12:05:34Z

I am happy to create a pull request with this change, if you agree.

jnivre · 2015-10-13T15:16:56Z

This has been discussed before, although I don't remember whether it was ever formally raised as an issue. Personally, I am all in favor of getting rid of the old -LRB- and -RRB- in the interest of uniformity across languages, but I remember that the English were a little reluctant because most (?) English tokenizers actually output -LRB- and -RRB- rather than ( and ). Perhaps it is time to settle this once and for all.

fginter · 2015-10-13T15:22:14Z

I think this is the 3rd time this issue is raised. If this should be once and for all, then I cannot miss the opportunity to say 👍 for getting rid of those.

dan-zeman · 2015-10-14T06:50:06Z

The previous issue about this was opened at the docs repository, here: UniversalDependencies/docs#148

There seems to be consensus that the escaping should be dropped, it just has not happened yet. Perhaps the only question here is whether the UD_English team are prepared to receive the update via a pull request. If they have done other changes in the meantime, which have not been pushed, it may be difficult to merge. @manning @ngiordani @tdozat could you please comment on this.

foxik · 2015-10-14T07:02:03Z

To move things forward, I created a pull request. I am happy to rebase it at any time, so if/when we are ready to merge it, I will update it.

sebschu · 2015-10-14T17:59:57Z

Thanks for the pull request, @foxik, but we are actually not directly editing the files that we are pushing to the public GitHub repository, so this would get overwritten again with the next release. We'll discuss this in our meeting on Friday but I think we should be able to make this change for the next release.

foxik · 2015-10-16T14:54:02Z

@sebschu Thanks for clarifying, I will close the pull request.

manning · 2015-10-16T19:48:46Z

Agreed. We will change this to use ( and ). I think we can do this for the version 1.2 release.
Thanks for offering pull requests @foxik, but I think in practice that since we annotate article-specific files and then concatenate to produce the released UD files, that it would be easier for us to just do this ourselves....

foxik · 2015-10-19T09:58:39Z

I understand. This is a trivial change, so performing it by yourselves would be probably easier even if you used this repository as a primary source.

Thanks for doing this in the 1.2 release.

manning · 2015-10-28T03:44:18Z

Notwithstanding that there are still validation errors to be addressed, this is fixed in our version 1.2 release candidate.

* Fixes #88 * Implemented using the following DepEdit script: ``` #no special adverbial status for WH adverb subordinations lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/mark/&head=/(.*)/&pos=/ADV/ none #1:pos=SCONJ lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/advmod/&head=/(.*)/ none #1:func=mark;#1:pos=SCONJ;#1:head2=$2:mark #exception for WH adverbs in questions, identified by question mark and not being an advcl func!=/advcl/;lemma=/^(when|how|where|while|why|whenever|wherever)$/&pos=/SCONJ/&func=/mark/&head=/(.*)/;text=/^(\?+)!?$/ #1>#2;#2.*#3 #2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod #exception for 'why not' func=/root/;lemma=/why/&func=/mark/&head=/(.*)/;lemma=/not/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$1:advmod #exception for do support func=/root/;lemma=/^(why|how|when|where)$/&func=/mark/&head=/(.*)/;lemma=/do/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod ```

foxik mentioned this issue Oct 14, 2015

Remove Penn Treebank escaping of brackets... #2

Closed

foxik mentioned this issue Oct 16, 2015

Add SpaceAfter=No feature. #4

Closed

manning added the enhancement label Oct 16, 2015

manning added this to the Release 1.2 internal data freeze milestone Oct 16, 2015

manning closed this as completed Oct 28, 2015

adarshp mentioned this issue Mar 29, 2018

Parentheses escaped as -LRB-, -RRB- clulab/eidos#235

Closed

danyaljj mentioned this issue Sep 29, 2021

Task 933 Style Transfer (Simplify Sentences) allenai/natural-instructions#348

Merged

hoangthangta mentioned this issue Feb 26, 2023

What are -lrb- and -rrb- in texts used for? KaijuML/parent#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

-LRB- and -RRB- in Form and Lemma #1

-LRB- and -RRB- in Form and Lemma #1

foxik commented Oct 13, 2015

foxik commented Oct 13, 2015

jnivre commented Oct 13, 2015

fginter commented Oct 13, 2015

dan-zeman commented Oct 14, 2015

foxik commented Oct 14, 2015

sebschu commented Oct 14, 2015

foxik commented Oct 16, 2015

manning commented Oct 16, 2015

foxik commented Oct 19, 2015

manning commented Oct 28, 2015

-LRB- and -RRB- in Form and Lemma #1

-LRB- and -RRB- in Form and Lemma #1

Comments

foxik commented Oct 13, 2015

foxik commented Oct 13, 2015

jnivre commented Oct 13, 2015

fginter commented Oct 13, 2015

dan-zeman commented Oct 14, 2015

foxik commented Oct 14, 2015

sebschu commented Oct 14, 2015

foxik commented Oct 16, 2015

manning commented Oct 16, 2015

foxik commented Oct 19, 2015

manning commented Oct 28, 2015