Release for spaCy v3
This release makes spacy_conll
compatible with spaCy's new v3 release. On top of that some improvements were made to make the project easier to maintain.
- [general] Breaking change: spaCy v3 required (closes #8)
- [init_parser] Breaking change: in all cases,
is_tokenized
now disables sentence segmentation - [init_parser] Breaking change: no more default values for parser or model anywhere. Important to note here that
spaCy does not work with short-hand codes such asen
any more. You have to provide the full model name, e.g.
en_core_web_sm
- [init_parser] Improvement: models are automatically downloaded for Stanza and UDPipe
- [cli] Reworked the position of the CLI script in the directory structure as well as the arguments. Run
parse-as-conll -h
for more information. - [conllparser] Made the ConllParser class available as a utility to easily create a wrapper for a spaCy-like
parser which can return the parsed CoNLL output of a given file or text - [conllparser,cli] Improvements to usability of
n_process
. Will try to figure out whether multiprocessing
is available for your platform and if not, tell you so. Such a priori error messages can be disabled, with
ignore_pipe_errors
, both on the command line as in ConllParser's parse methods