fast-lang

Language detection tool based on fastText pretrained model.

Text preprocessing

Numbers, punctuation and repeating whitespaces are removed before feeding into language detector tool.

Examples

from fastlang import FastLangDetect

detector = FastLangDetect()

detector.detect('Where is my mother?') 
# {'en': 0.996435284614563}

detector.detect('Where is my mother?', k=3)
# {'en': 0.996435284614563, 'th': 0.0005820714286528528, 'bn': 0.0005180443404242396}

As the examples demonstrates you can specify how many labels to return with associated probabilities. Output can also be controlled by the threshold parameter which filters result based on probability value.

detector.detect('Where is my mother?', k=3, threshold=0.5)
# {'en': 0.996435284614563}

Labels are ISO 639-1 encoded. If you want to check what is the corresponding language use iso_codes

from fastlang import iso_codes

iso_codes['en']
# 'English'

Language detector also works with lists of strings.

from fastlang import FastLangDetect

detector = FastLangDetect()

detector.detect(['Where is my mother?', 'pies i kot na drodze.'])
# [{'en': 0.996435284614563}, {'sl': 0.6256219148635864}]

All 176 model lables can be exposed via get_labels() method.

detector.get_labels()

If you want associated frequencies just pass include_freq=True to the get_labels method.

Installation

pip install .

References

A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
fast-lang		fast-lang
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fast-lang

Text preprocessing

Examples

Installation

References

About

Releases

Packages

Languages

License

dkajtoch/fast-lang

Folders and files

Latest commit

History

Repository files navigation

fast-lang

Text preprocessing

Examples

Installation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages