Skip to content

Latest commit

 

History

History
20 lines (13 loc) · 1.47 KB

README.md

File metadata and controls

20 lines (13 loc) · 1.47 KB

hawaiian-corpus - Data from a corpus of written Hawaiian

This repository contains data based on a corpus of texts written in the Hawaiian language (ʻŌlelo Hawaiʻi). The data includes frequency lists, stopwords, and lists of most common n-grams. The text in the corpus was obtained from Ulukau, the Hawaiian Electronic Library.

There are a total of 10.7 million words in the corpus, which was restricted to modern (post-20th century) and non-scriptural text. An overview of statistics for the corpus (including the top most common words and n-grams) can be seen here.

Data

Files included in this repository:

License

CC0.