galahad-taggers-dockerized

GaLAHaD Taggers Dockerized provides a unified interface for linguistic annotation taggers to be added to GaLAHaD or to be run on their own. Tagger are containerized and can be accessed with an API in order to tag documents. Documents are queued and sent to a callback server once tagged.

GaLAHaD-related Repositories

galahad
galahad-train-battery
galahad-taggers-dockerized [you are here]
galahad-corpus-data
int-pie
int-huggingface-tagger [to be released]

This deployment architecture is developed for the project GaLAHaD, but can also be run in standalone mode.

This repository refers to tagger models that have already been trained and are ready to be used in production. To train models, see galahad-train-battery.

Quick start

Clone this repository and its submodules.

git clone --recurse-submodules https://github.com/INL/galahad-taggers-dockerized

Pull builds from Docker Hub

Do you have docker and docker compose? Do you have access to the public Docker Hub instituutnederlandsetaal? Then you can clone this repository and run

docker compose up

To run the taggers locally locally. The taggers are available on localhost with port equal to their devport. (You can find out the devport by looking at the port-forwards defined in docker-compose.yml)

Alternatively you can start specific taggers with:

docker compose up [SPECIFIC_TAGGER_1] [SPECIFIC_TAGGER_2]

Local builds

Build the docker images: see buildall.sh.

Connecting to an Galahad-like endpoint

If you want to connect the taggers to an endpoint outside of a docker network, you can specify a .env.dev file like

CALLBACK_SERVER=http://host.docker.internal:8010/internal/jobs

and use it instead of the default .env file like

docker compose --env-file .env.dev

Creating your own tagger

To create your own tagger, use the base tagger as a starting point and overwrite process.py. I.e., start your Dockerfile with:

FROM instituutnederlandsetaal/taggers-dockerized-base:$tag
COPY --link process.py /

And fill out the process() and (optionally) init() functions of base/process.py. The in_file points to a plain text file. Currently, your tagger is expected to produce tsv as output. The output tsv must contain a header with at least the columns 'token', 'lemma', 'pos' defined in any order.

Running your own tagger

Define your tagger as a service in a docker compose file, say your-tagger-dockerized.yml . (You can use docker-compose.yml as guidance)

Make sure the tagger is in the taggers-network network. your-tagger-dockerized.yml should specify it as an external network and add your tagger to it:

services:
 my-tagger:
  ...
  ports:
   - 8091:8080 # optional devport
  networks:
   - taggers-network
networks:
 taggers-network:
  external: true

Launch your tagger
```
docker compose -f your-tagger-dockerized.yml up
```
(add the optional -d flag to run in detached mode)

If you specified a devport, you can now find your tagger at localhost port devport.

Make Galahad aware of your tagger

All that is left, is to add a yaml metadata file in the server/data/taggers/ folder of Galahad. See the Galahad repository for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
base		base
pie		pie
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
TaggerAPITest.py		TaggerAPITest.py
buildall.sh		buildall.sh
buildandpushall.sh		buildandpushall.sh
codemeta-harvest.json		codemeta-harvest.json
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
readme.md		readme.md
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

galahad-taggers-dockerized

GaLAHaD-related Repositories

Quick start

Pull builds from Docker Hub

Local builds

Connecting to an Galahad-like endpoint

Creating your own tagger

Running your own tagger

Make Galahad aware of your tagger

About

Releases 3

Packages

Languages

License

INL/galahad-taggers-dockerized

Folders and files

Latest commit

History

Repository files navigation

galahad-taggers-dockerized

GaLAHaD-related Repositories

Quick start

Pull builds from Docker Hub

Local builds

Connecting to an Galahad-like endpoint

Creating your own tagger

Running your own tagger

Make Galahad aware of your tagger

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages