Skip to content

A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.

Notifications You must be signed in to change notification settings

jerry2yu/ngrams

Repository files navigation

information at http://users.cs.dal.ca/~vlado/srcperl/Ngrams/Ngrams.html

How to use it:

  1. download and save the source code.

  2. $ make

  3. $ ngrams --type=word --n=3 --in= sample.txt

    or

    $ ngrams --type=character -n=3 --in= sample.txt

    or

    Byte ngrams, e.g., getting ngrams from binary file.

    $ ngrams --type=byte -n=3 --in= sample.txt

That's it.

If you found any bug or have any suggestion, please kindly send me email [email protected]. Thanks.

Zheyuan Yu. Feb 18,2006

About

A package in C++ for character or word ngram analysis. It uses Ternary Search Tree instead of hashing table for faster ngram frequency counting. Words are converted to unique IDs and encoded to more compact base 256 integers. It is a partial implementation of Dr. Vlado Keselj 's Text-Ngrams 1.6, which is a very flexible Ngram package in perl.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published