-
Notifications
You must be signed in to change notification settings - Fork 8
Changes
Wittawat Jitkrittum edited this page Mar 7, 2015
·
1 revision
Initial release. Featuring:
- Support basic command line interface. Outputs TCCs of the form "tcc1|tcc2|...|lasttcc|"
- Support ignore tags ..... The text wrapped in the tag will not be tokenized by JTCC.
- Consecutive spaces are grouped into one TCC.
- Consecutive Arabic digits are grouped into one TCC (same to Thai digits).
- Consecutive English alphabets are grouped into one TCC.
- Newline characters are not treated as one TCC. Rather, consecutive newlines will be grouped with the next TCC.
- A delimiter is placed at the end of each TCC.