GaLAHaD Taggers Dockerized provides a unified interface for linguistic annotation taggers to be added to GaLAHaD or to be run on their own. Tagger are containerized and can be accessed with an API in order to tag documents. Documents are queued and sent to a callback server once tagged.
- galahad
- galahad-train-battery
- galahad-taggers-dockerized [you are here]
- galahad-corpus-data
- int-pie
- int-huggingface-tagger [to be released]
This deployment architecture is developed for the project GaLAHaD, but can also be run in standalone mode.
This repository refers to tagger models that have already been trained and are ready to be used in production. To train models, see galahad-train-battery.
Clone this repository and its submodules.
git clone --recurse-submodules https://github.com/INL/galahad-taggers-dockerized
Do you have docker and docker compose? Do you have access to the public Docker Hub instituutnederlandsetaal? Then you can clone this repository and run
docker compose up
To run the taggers locally locally. The taggers are available on localhost
with port equal to their devport. (You can find out the devport by looking at the port-forwards defined in docker-compose.yml
)
Alternatively you can start specific taggers with:
docker compose up [SPECIFIC_TAGGER_1] [SPECIFIC_TAGGER_2]
Build the docker images: see buildall.sh
.
If you want to connect the taggers to an endpoint outside of a docker network, you can specify a .env.dev
file like
CALLBACK_SERVER=http://host.docker.internal:8010/internal/jobs
and use it instead of the default .env file like
docker compose --env-file .env.dev
To create your own tagger, use the base tagger as a starting point and overwrite process.py
. I.e., start your Dockerfile with:
FROM instituutnederlandsetaal/taggers-dockerized-base:$tag
COPY --link process.py /
And fill out the process() and (optionally) init() functions of base/process.py.
The in_file
points to a plain text file. Currently, your tagger is expected to produce tsv as output. The output tsv must contain a header with at least the columns 'token', 'lemma', 'pos' defined in any order.
- Define your tagger as a service in a docker compose file, say
your-tagger-dockerized.yml
. (You can usedocker-compose.yml
as guidance) - Make sure the tagger is in the
taggers-network
network.your-tagger-dockerized.yml
should specify it as an external network and add your tagger to it:services: my-tagger: ... ports: - 8091:8080 # optional devport networks: - taggers-network networks: taggers-network: external: true
- Launch your tagger
(add the optional
docker compose -f your-tagger-dockerized.yml up
-d
flag to run in detached mode)
If you specified a devport, you can now find your tagger at localhost
port devport.
All that is left, is to add a yaml metadata file in the server/data/taggers/ folder of Galahad. See the Galahad repository for more details.