The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.
The main function is text-langs
that returns 2 values:
- a lang - probability alist (languages are represented by their ISO-639-1 codes)
- a vector of tokens with their inferred langs
WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)
- Install SBCL
- Get Quicklisp
- Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp
docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest
curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}" http://localhost:5000/detect | jq '.'
Or you can use prebuilt Docker image maintained outside of this repository.
docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest