Skip to content
Wittawat Jitkrittum edited this page Mar 7, 2015 · 1 revision

Version 0.1

Initial release. Featuring:

  • Support basic command line interface. Outputs TCCs of the form "tcc1|tcc2|...|lasttcc|"
  • Support ignore tags ..... The text wrapped in the tag will not be tokenized by JTCC.
  • Consecutive spaces are grouped into one TCC.
  • Consecutive Arabic digits are grouped into one TCC (same to Thai digits).
  • Consecutive English alphabets are grouped into one TCC.
  • Newline characters are not treated as one TCC. Rather, consecutive newlines will be grouped with the next TCC.
  • A delimiter is placed at the end of each TCC.
Clone this wiki locally