You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pySBD spaCy pipeline component uses a token-based approach and sets is_sent_start to True or False depending on Spans obtained from pySBD character offsets. We create Span objects using doc.char_span method by creating a slice - doc.text[start:end] which is a sentence span whose first Token object needs to have attribute is_sent_start set to True. On the other hand, if the character indices don’t map to a valid span it returns None . Hence we get irregularities in pySBD & pySBD + spaCy sentence output.
pySBD spaCy pipeline component uses a token-based approach and sets
is_sent_start
toTrue
orFalse
depending onSpan
s obtained from pySBD character offsets. We createSpan
objects usingdoc.char_span
method by creating a slice -doc.text[start:end]
which is a sentence span whose firstToken
object needs to have attributeis_sent_start
set toTrue
. On the other hand, if the character indices don’t map to a valid span it returnsNone
. Hence we get irregularities in pySBD & pySBD + spaCy sentence output.The inability to get
Span
object from pySBD character offsets can be tackled using the deconstruction ofDoc
object like the way PKSHATechnology-Research/camphr authors have writtenget_doc_char_span
which usesdestruct_token
The text was updated successfully, but these errors were encountered: