|
1 | 1 | # SLT.KIT
|
2 |
| -Spoken Language Translation System |
| 2 | + |
| 3 | +This repository contains a Spoken Language Translation System. It can be used to translate the output of an Automatic Speech Recognition (ASR) system. The system contains of an monolingual translation system that adds punctuation marks to the output of the ASR system. Furthermore, it recases the output. Then the output is translated by an machine translation system system. The system can be used to train such system as well as pre-trained systems are availabel. The systems can be trained and used by installing the docker container. |
| 4 | + |
| 5 | +The system uses the following software: |
| 6 | +* [OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py) |
| 7 | +* [Moses](http://www.statmt.org/moses/) |
| 8 | +* [Subword NMT](https://github.com/rsennrich/subword-nmt) |
| 9 | +* [Translation error rate](http://www.cs.umd.edu/%7Esnover/tercom/) |
| 10 | +* [BEER](https://github.com/stanojevic/beer) |
| 11 | +* [CharacTER](https://github.com/rwth-i6/CharacTER) |
| 12 | + |
| 13 | + |
| 14 | +Requirements: |
| 15 | +* [Docker](https://www.docker.com/) |
| 16 | + |
| 17 | +## Installation ## |
| 18 | + |
| 19 | +```bash |
| 20 | + git clone https://github.com/jniehues-kit/SLT.KIT.git |
| 21 | + cd SLT.KIT |
| 22 | + docker build -t slt.kit -f Dockerfile.ST-Baseline . |
| 23 | +``` |
| 24 | + |
| 25 | +## Run ## |
| 26 | + |
| 27 | + |
| 28 | +* Starting the docker container (e.g. source language English (en) and target language German (de)) |
| 29 | + |
| 30 | + |
| 31 | +```bash |
| 32 | + docker run -ti --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=$gpuid slt.kit |
| 33 | + export sl=en |
| 34 | + export tl=de |
| 35 | +``` |
| 36 | + |
| 37 | +* Within a docker container |
| 38 | + |
| 39 | + 1 Training a model or download pre-trained model |
| 40 | + * Training a model |
| 41 | + |
| 42 | +```bash |
| 43 | + /opt/SLT.KIT/systems/${model}/Train.sh |
| 44 | +``` |
| 45 | + |
| 46 | + * Download a pre-trained model |
| 47 | + |
| 48 | + |
| 49 | +```bash |
| 50 | + /opt/SLT.KIT/systems/${model}/Download.sh |
| 51 | +``` |
| 52 | + |
| 53 | + |
| 54 | + |
| 55 | + 2 Translate test set |
| 56 | + |
| 57 | +```bash |
| 58 | + /opt/SLT.KIT/systems/${model}/Translate.sh $testset |
| 59 | +``` |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | + |
| 64 | +where $model can be: |
| 65 | +* smallTED |
| 66 | + |
| 67 | +and $testset can be: |
| 68 | +* dev2010 |
| 69 | +* tst2010 |
| 70 | +* tst2013 |
| 71 | +* tst2014 |
| 72 | +* tst2015 |
| 73 | + |
| 74 | + |
| 75 | +## Models ## |
| 76 | + |
| 77 | + |
| 78 | +### small TED ### |
| 79 | + |
| 80 | +SLT model trained only on the [TED corpus] (https://wit3.fbk.eu/) |
| 81 | + |
| 82 | +Peformance: |
| 83 | + |
| 84 | +| SET | BLEU | TER | BEER | CharacTER | BLEU(ci) | TER(ci) | |
| 85 | +| --- | ---- | --- | ---- | --------- | -------- | ------- | |
| 86 | +| dev2010 | 14.46 | 70.98 | 46.61 | 83.77 | 15.42 | 69.00 | |
| 87 | +| dev2010 (manual Transcript) | 23.45 | 56.74 | 54.44 | 56.77 | 25.03 | 55.17 | |
| 88 | +| tst2010 | 10.41 | 76.53 | 36.15 | 318.59 | 11.04 | 74.96 | |
| 89 | +| tst2010 (manual Transcript) | 24.81 | 55.66 | 53.34 | 55.85 | 26.41 | 54.04 | |
| 90 | +| tst2013 | 13.91 | 71.71 | 44.54 | 80.07 | 14.81 | 69.60 | |
| 91 | +| tst2013 (manual Transcript) | 26.05 | 54.27 | 54.34 | 54.22 | 27.49 | 52.98 | |
| 92 | +| tst2014 | 13.24 | 72.34 | 43.78 | 83.44 | 14.03 | 70.57 | |
| 93 | +| tst2014 (manual Transcript) | 22.31 | 58.36 | 51.85 | 57.66 | 23.18 | 57.44 | |
| 94 | +| tst2015 | 13.03 | 83.20 | 43.66 | 74.03 | 13.75 | 81.30 | |
| 95 | +| tst2015 (manual Transcript) | 25.07 | 57.76 | 53.10 | 54.77 | 26.06 | 56.81 | |
| 96 | + |
| 97 | + |
0 commit comments