Language detection tool based on fastText pretrained model.
Numbers, punctuation and repeating whitespaces are removed before feeding into language detector tool.
from fastlang import FastLangDetect
detector = FastLangDetect()
detector.detect('Where is my mother?')
# {'en': 0.996435284614563}
detector.detect('Where is my mother?', k=3)
# {'en': 0.996435284614563, 'th': 0.0005820714286528528, 'bn': 0.0005180443404242396}
As the examples demonstrates you can specify how many labels to return with associated probabilities.
Output can also be controlled by the threshold
parameter which filters result based on probability value.
detector.detect('Where is my mother?', k=3, threshold=0.5)
# {'en': 0.996435284614563}
Labels are ISO 639-1 encoded. If you want to check what is the corresponding language use iso_codes
from fastlang import iso_codes
iso_codes['en']
# 'English'
Language detector also works with lists of strings.
from fastlang import FastLangDetect
detector = FastLangDetect()
detector.detect(['Where is my mother?', 'pies i kot na drodze.'])
# [{'en': 0.996435284614563}, {'sl': 0.6256219148635864}]
All 176 model lables can be exposed via get_labels()
method.
detector.get_labels()
If you want associated frequencies just pass include_freq=True
to the get_labels
method.
pip install .
- A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
- A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models