sugali

This is a legacy repository of the language identification project for many (many) languages project for the software project course, NLP projects for low-resource languages.

Final technical report on http://www.coli.uni-saarland.de/courses/cl4lrl-swp/data/SugaliPoster.pdf

Description

Given a string of text in an arbitrary language, can we train a system to recognize what language the text is written in? The project uses three sources of data: the Universal Declaration of Human Rights, Wikipedia, ODIN, and some portions of the data available from Omniglot. The resulting sytem cover well over 1000 languages with their system.

As a spin-off, we've also produce the SeedLing corpus with data from over a 1000 languages. The corpus is freely available on the SeedLing github repository. The reference paper for the corpus is on https://www.aclweb.org/anthology/W14-2211/

Credits

Susanne Fertmann
Guy Emerson
Liling Tan
Alexis Palmer
Michaela Regneri

Cite

If you would need to refer to the poster or the code, feel free to cite

@misc{sugali,
  author = {Susanne Fertmann and Guy Emerson and Liling Tan},
  title = {Language Identification for Low-Resource Languages},
  year = {2014}, 
  url = "https://github.com/alvations/sugali/",
  institution = {Saarland University, Germany},
  note = "Technical Report for NLP projects for low-resource languages. Saarland, Germany"
}

Name		Name	Last commit message	Last commit date
Latest commit History 365 Commits
docs		docs
firstweektask		firstweektask
sugarlike		sugarlike
universalcorpus		universalcorpus
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sugali

Description

Credits

Cite

About

Releases

Packages

Contributors 4

Languages

alvations/sugali

Folders and files

Latest commit

History

Repository files navigation

sugali

Description

Credits

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages