Skip to content

better Cyrillic text tokenization

Compare
Choose a tag to compare
@uhermjakob uhermjakob released this 30 Nov 08:11
· 12 commits to master since this release

New version 1.3.7:

  • Better handling of Cyrillic text, especially hyphenated tokens.
  • Better handling of some em/en-dashes, replacement character at beginning or end of token.