Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-LRB- and -RRB- in Form and Lemma #1

Closed
foxik opened this issue Oct 13, 2015 · 10 comments
Closed

-LRB- and -RRB- in Form and Lemma #1

foxik opened this issue Oct 13, 2015 · 10 comments

Comments

@foxik
Copy link
Member

foxik commented Oct 13, 2015

The -LRB- and -RRB- are currently used instead of "(" and ")" in Form, and -lrb- and -rrb- instead of "(" and ")" in Lemma. I think plain "(" and ")" shoulud be used. In UD 1.1, English is the only language which does not use plain "(" and ")" in forms.

I understand that -LRB- and -RRB- (and in past -LCB-, -RCB-, -LSB-, -RSB-) are used in Penn Treebank formats, because "(" and ")" have special meaning in that format. But since "(" and ")" are not special in CoNLL-U, I believe using "(" and ")" instead of -LRB- and -RRB- would be better.

@foxik
Copy link
Member Author

foxik commented Oct 13, 2015

I am happy to create a pull request with this change, if you agree.

@jnivre
Copy link

jnivre commented Oct 13, 2015

This has been discussed before, although I don't remember whether it was ever formally raised as an issue. Personally, I am all in favor of getting rid of the old -LRB- and -RRB- in the interest of uniformity across languages, but I remember that the English were a little reluctant because most (?) English tokenizers actually output -LRB- and -RRB- rather than ( and ). Perhaps it is time to settle this once and for all.

@fginter
Copy link
Member

fginter commented Oct 13, 2015

I think this is the 3rd time this issue is raised. If this should be once and for all, then I cannot miss the opportunity to say 👍 for getting rid of those.

@dan-zeman
Copy link
Member

The previous issue about this was opened at the docs repository, here: UniversalDependencies/docs#148

There seems to be consensus that the escaping should be dropped, it just has not happened yet. Perhaps the only question here is whether the UD_English team are prepared to receive the update via a pull request. If they have done other changes in the meantime, which have not been pushed, it may be difficult to merge. @manning @ngiordani @tdozat could you please comment on this.

@foxik
Copy link
Member Author

foxik commented Oct 14, 2015

To move things forward, I created a pull request. I am happy to rebase it at any time, so if/when we are ready to merge it, I will update it.

@sebschu
Copy link
Member

sebschu commented Oct 14, 2015

Thanks for the pull request, @foxik, but we are actually not directly editing the files that we are pushing to the public GitHub repository, so this would get overwritten again with the next release. We'll discuss this in our meeting on Friday but I think we should be able to make this change for the next release.

@foxik
Copy link
Member Author

foxik commented Oct 16, 2015

@sebschu Thanks for clarifying, I will close the pull request.

@manning
Copy link
Contributor

manning commented Oct 16, 2015

Agreed. We will change this to use ( and ). I think we can do this for the version 1.2 release.
Thanks for offering pull requests @foxik, but I think in practice that since we annotate article-specific files and then concatenate to produce the released UD files, that it would be easier for us to just do this ourselves....

@foxik
Copy link
Member Author

foxik commented Oct 19, 2015

I understand. This is a trivial change, so performing it by yourselves would be probably easier even if you used this repository as a primary source.

Thanks for doing this in the 1.2 release.

@manning
Copy link
Contributor

manning commented Oct 28, 2015

Notwithstanding that there are still validation errors to be addressed, this is fixed in our version 1.2 release candidate.

@manning manning closed this as completed Oct 28, 2015
amir-zeldes added a commit that referenced this issue Jun 9, 2020
  * Fixes #88
  * Implemented using the following DepEdit script:

```
#no special adverbial status for WH adverb subordinations
lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/mark/&head=/(.*)/&pos=/ADV/	none	#1:pos=SCONJ
lemma=/^(when|how|where|while|why|whenever|wherever)$/&func=/advmod/&head=/(.*)/	none	#1:func=mark;#1:pos=SCONJ;#1:head2=$2:mark

#exception for WH adverbs in questions, identified by question mark and not being an advcl
func!=/advcl/;lemma=/^(when|how|where|while|why|whenever|wherever)$/&pos=/SCONJ/&func=/mark/&head=/(.*)/;text=/^(\?+)!?$/	#1>#2;#2.*#3	#2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod

#exception for 'why not'
func=/root/;lemma=/why/&func=/mark/&head=/(.*)/;lemma=/not/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$1:advmod

#exception for do support
func=/root/;lemma=/^(why|how|when|where)$/&func=/mark/&head=/(.*)/;lemma=/do/\t#1>#2;#2.#3\t#2:func=advmod;#2:pos=ADV;#2:head2=$2:advmod
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants