Skip to content

For when you want fastText language identification, but you also want to believe the answers

License

Notifications You must be signed in to change notification settings

LuminosoInsight/lumi_language_id

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lumi_language_id

Utilities for reliable-enough language detection.

This package wraps fastText's "lid.176" language-detection model with another classifier, which is trained to produce better probability estimates. It also applies text cleaning, so that the text it detects is unaffected by punctuation, digits, or emoji.

Example:

>>> from lumi_language_id.tuned import TunedLanguageIdentifier
>>> lid = TunedLanguageIdentifier.load()
>>> lang, _prob = lid.detect_language("these are words")
>>> lang
'en'

>>> lang, _prob = lid.detect_language("aquí hay algunas palabras")
>>> lang
'es'

About

For when you want fastText language identification, but you also want to believe the answers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages