Support "SpaceAfter=No" for Untokenized Text #6

KoichiYasuoka · 2020-02-21T08:12:20Z

Support "SpaceAfter=No" for Untokenized Text

Support "SpaceAfter=No" for [Untokenized Text](https://universaldependencies.org/format.html#untokenized-text)

BramVanroy · 2020-02-21T08:32:08Z

Probably should test whether the .whitespace_ attribute is present in stanfordnlp because otherwise this might break the module.

KoichiYasuoka · 2020-02-21T14:48:14Z

import stanfordnlp
from spacy_stanfordnlp import StanfordNLPLanguage
en=StanfordNLPLanguage(stanfordnlp.Pipeline(lang="en"))
s=en("Yes, it's on-going.")
for t in s:
  print("\t".join([str(t.i+1),t.orth_,t.lemma_,t.pos_,t.tag_,"_",str(0 if t.head==t else t.head.i+1),t.dep_,"_","_" if t.whitespace_ else "SpaceAfter=No"]))

The script shown above worked well in my Linux (Debian), and produced CONLL below.

1	Yes	yes	INTJ	UH	_	7	discourse	_	SpaceAfter=No
2	,	,	PUNCT	,	_	7	punct	_	_
3	it	it	PRON	PRP	_	7	nsubj	_	SpaceAfter=No
4	's	be	AUX	VBZ	_	7	cop	_	_
5	on	on	ADV	RB	_	7	advmod	_	SpaceAfter=No
6	-	-	PUNCT	HYPH	_	7	punct	_	SpaceAfter=No
7	going	go	VERB	VBG	_	0	root	_	SpaceAfter=No
8	.	.	PUNCT	.	_	7	punct	_	SpaceAfter=No

BramVanroy · 2020-04-28T08:16:49Z

LGTM, thanks!

Update __init__.py

1c24127

Support "SpaceAfter=No" for [Untokenized Text](https://universaldependencies.org/format.html#untokenized-text)

BramVanroy changed the title ~~Update __init__.py~~ Support "SpaceAfter=No" for Untokenized Text Feb 21, 2020

BramVanroy merged commit c954aa0 into BramVanroy:master Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "SpaceAfter=No" for Untokenized Text #6

Support "SpaceAfter=No" for Untokenized Text #6

KoichiYasuoka commented Feb 21, 2020

BramVanroy commented Feb 21, 2020

KoichiYasuoka commented Feb 21, 2020

BramVanroy commented Apr 28, 2020

Support "SpaceAfter=No" for Untokenized Text #6

Support "SpaceAfter=No" for Untokenized Text #6

Conversation

KoichiYasuoka commented Feb 21, 2020

BramVanroy commented Feb 21, 2020

KoichiYasuoka commented Feb 21, 2020

BramVanroy commented Apr 28, 2020