Skip to content

sylvainloiseau/igtcorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Utilities for converting interlinear glossed texts (IGT) corpora between the following formats:

  • EMELD (Cathy Bow, Baden Hughes, Steven Bird, (2003) "Towards a general model of interlinear text", Proceedings of Emeld workshop 2003) Online. Used in particular by SIL FLEX
  • CONLL
  • ELAN Elan website [in a specific configuration -- can be adapted to others]
  • JSON representation of Emeld

Installation

pip install git+https://github.com/sylvainloiseau/igtcorpus.git#egg=igtcorpus

Usage

Command line interface:

$ igtc -i input.xml -o output.json -f emeld -t json -l tww -m en
$ # igtc --output=/Users/sloiseau/Downloads/conll --input=/Users/sloiseau/Downloads/2014T1.xml --fromformat=emeld --toformat=conll -l tww -m en

See the doc:

$ igtc -h
usage: igtc [-h] [--verbose] --output OUTPUT --input INPUT --fromformat {json,emeld,elan} --toformat {json,emeld,conll}

Utilities for converting between interlinear glossed texts formats.

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         output detailled information
  --output OUTPUT, -o OUTPUT
                        output file
  --input INPUT, -i INPUT
                        input file
  --fromformat {json,emeld,elan}, -f {json,emeld,elan}
                        input file format
  --toformat {json,emeld,conll}, -t {json,emeld,conll}
                        output file format
  --olanguage OLANGUAGE, -l OLANGUAGE
                        Object language
  --mlanguage MLANGUAGE, -m MLANGUAGE
                        Meta language

API

from igtcorpus.elan import ElanCorpoAfr
from igtcorpus.igt import Corpus
from igtcorpus.emeld import Emeld
from igtcorpus.json import EmeldJson

# Read...
# - EAF (elan) file
corpus = ElanCorpoAfr.read("tests/data/BEJ_MV_CONV_01_RICH.EAF")
# - Emeld document
corpus = Emeld.read("tests/data/test.emeld.xml")
# - json
corpus = EmeldJson.read("tests/data/tiny.json")

# ...Write...
# - as emeld
Emeld.write(corpus, "corpus.emeld")
# - as JSON
EmeldJson.write(corpus, "corpus.json")

Releases

No releases published

Packages

No packages published

Languages