Skip to content

Commit

Permalink
Optimize alignments (#703)
Browse files Browse the repository at this point in the history
* Add fast Moses tokenizer

* Tokenize corpus and remap alignments

* Use moses tokenizer in taskcluster

* Add tests for index mapping

* Add packaged to build fast moses tokenizer

* Fix an issue with LD_LIBRARY_PATH for fast moses tokenizer

* Rename tokenization function

* Rename chunking parameter

* Relock poetry

* Rerun linter
  • Loading branch information
eu9ene authored Jun 27, 2024
1 parent 43967d0 commit 1ba10cf
Show file tree
Hide file tree
Showing 13 changed files with 819 additions and 482 deletions.
4 changes: 4 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ RUN apt-get update -qq \
libopenblas-dev \
openssl \
libssl-dev \
pkg-config \
libre2-dev \
libglib2.0-dev \
python3-pybind11 \
&& apt-get clean

RUN mkdir /builds/worker/tools && \
Expand Down
Loading

0 comments on commit 1ba10cf

Please sign in to comment.