defining dtd

MAE user's guide

Defining an Annotation Task

Creating an annotation task for MAE is fairly straightforward. The format of the input is a simplified DTDs (Document Type Definitions) used for specifying XML elements/tags. There are three main parts of task creation: the task name, the tag names, and the tag attributes. DTDs are plain text files with a .dtd extension – you can create them in any text editing program. For full specification for DTD declaration, refer to the web resources, such as Wikipedia. Note that current version of MAE does not support full functionality of DTD declarations.

Task Name

The task name is defined with the !ENTITY tag. If you wanted to create a task called “myTask”, then create the !ENTITY line with name and the name of the task in double quotes. The line to do so would look like this:

    <!ENTITY name "myTask">

This simply provides the name of the root tag element in the annotation XML output files. While you are revising your annotations specification, it is useful to use the task name string as a version control by storing information about which version of the specification it defines. If you have several DTD versions while going through iterations of myTask, specifying version in the task name (e.g. such as myTask_v1.0, myTask_v1.1, ...) will facilitate the process of determining which version a document was annotated under. You can imagine that if you make revisions to your specification, but only maintain a single DTD file for your task as it evolves, it could be a major headache (for both you and your annotators) to determine which documents were covered by which iteration of your DTD. And in fact, not only it could be a management nightmare, but also it will be a real technical problem since MAE assumes all annotation files sharing the same task name will share the same tag structure as well. Since DTDs are simply text files, you can put them in your version control system of choice, and are encouraged to do so.

Tag Elements

Tag elements (defined by !ELEMENT) are used to define the names of the tags being used in your annotation task and their attributes. MAE recognizes two types of tags: extent tags (tags used to label spans of text in the document) and link tags (tags that identify a relationship between extent tags). However, you cannot have two tags with the same name, even if they are of different types.

To define an extent tag for your task, the line in your DTD will look like this:

    <!ELEMENT ExtentTagName ( #PCDATA ) >

while a link tag will look like this:

    <!ELEMENT LinkTagName EMPTY >

( #PCDATA ) indicates that the ExtentTagNametags will be associated with some span of text in the document, while EMPTY indicates LinkTagNametags will be used for linking other tags to one another, but will not be associated with a particular span of text. When MAE reads in a task definition, each extent tag is assigned a color to visualize tag instance over the document one is annotating. Since link tags do not consume text spans, it is not necessary to visualize by manipulating text colors. So link tags would not be given colors.

Attributes

Attributes (defined by the !ATTLIST) contain the information associated with each tag. Some attributes are pre-defined by MAE – extent tags will always have id, spans, and text (old version of MAE has start and end instead of spans) attributes, even if they are not defined in the DTD. Link tags will always have id and at least two arguments, named from and to by default, and MAE will expand each argument into two attributes, namely xxxId and xxxText. Thus, if you have a link tag definition that contains from argument as well as fromId attribute, MAE cannot read in the DTD, since it will try to expand the argument from into fromId, and fromText, which will conflict with your original attribute fromId. Carefully choose attribute names when you're defining link tags.

To define attributes, one must include the name of the element that they are associated to, followed by the name of the attribute and the type of the attribute, like so:

    <!ATTLIST TagName attribute1 ( YES | NO ) #IMPLIED >
    <!ATTLIST TagName attribute2 CDATA #IMPLIED >

We will talk about the types of attributes later in this document. Before that let's go over the details of the special attributes pre-defined in MAE.

`id` Attributes

Every tag need an identifier to be distinguishable and referable. Thus MAE will give id attribute to all tags defined in the task. When annotators create tags, each tag will be given an numerical id combined with a tag specific prefix. By default MAE will use the first letter of the tag name, So, for example, a tag called Verb will have the ids V1, V2, V3, etc. If you have second tag that starts with V, MAE will also take the next letter as its prefix, and so on. If you want to specify your own prefix, you can explicitly define id attribute, and add prefix field to your element attribute, like so:

    <!ATTLIST Verb id ID prefix="VB" #REQUIRED >

It is highly recommended that you choose your tag names, and prefixes carefully so as to make the distinctions clear to yourself and annotators. Relying on MAE to automatically prefix your tag identifiers is especially not recommended if you have multiple tag types whose names begin with the same character.

`spans` Attribute

As previously mentioned, all extent tags have an attribute called spans, which denotes the character offset indices of the tag is anchored on. However, it is possible for an extent tag to be "non-consuming (NC)"; for instance, when you try to annotate hidden or omitted entities on the text, there won't be any character offsets a tag can be anchored. By default, MAE will not allow a tag to be non-consuming, but by defining an extent tag's spans attribute as #IMPLIED, opposed to #REQUIRED, MAE will allow that tag to be non-consuming. For example:

    <!ATTLIST Tag1 spans #IMPLIED >

Tag1, as specified, is allowed to be non-consuming. If you do not want to allow a tag to be non-consuming, it is not necessary to mention the spans attribute in the DTD at all, though you may explicitly define the spans attribute as #REQUIRED for the sake of clarity.

Old version of MAE used start and end attributes instead of a single spansattribute. Current MAE supports the legacy format, but old versions ( before 0.10 ) cannot load annotation files generated by newer MAE.

Example: Using Non-consuming (NC) extent tags to annotate timelines

You can use NC Extent tags to hack together a table that could be used as a timeline, as shown below.

    <!ELEMENT TemporalOrder (#PCDATA) >
    <!ATTLIST TemporalOrder spans #IMPLIED >
    <!ATTLIST TemporalOrder Character (Char1 | Char2 | Char2) #IMPLIED>
    <!ATTLIST TemporalOrder 1 CDATA #IMPLIED>
    <!ATTLIST TemporalOrder 2 CDATA #IMPLIED>
    <!ATTLIST TemporalOrder 3 CDATA #IMPLIED>
    <!ATTLIST TemporalOrder 4 CDATA #IMPLIED>

Timeline in MAE

The number of attributes will correspond to the number of columns in the table. The annotators can later fill them in with the text values, which could be e.g. the codes from annotated even spans. MAE displays them as tooltips.

`argN` Attributes

Starting from v0.11, MAE supports for creating link tags with arbitrary number of arguments. Unless specified in DTD, MAE will assume a link tag with two default arguments, from and to, (this was the only behavior before v0.11). To define three or more arguments of a particular type of link tag, one needs to use argN attributes in DTD. For example, to define a link tag with 4 arguments:

    <!ELEMENT LinkTagName EMPTY >
    <!ATTLIST LinkTagName arg0 IDREF #REQUIRED>
    <!ATTLIST LinkTagName arg1 IDREF #REQUIRED>
    <!ATTLIST LinkTagName arg2 IDREF #REQUIRED>
    <!ATTLIST LinkTagName arg3 IDREF #REQUIRED>

Note that argN attributes always have to start from 0 and take consecutive numbers.

Also one can specify names of each argument, using prefix field, like so:

    <!ELEMENT ARGUMENTS EMPTY >
    <!ATTLIST ARGUMENTS arg0 IDREF prefix="agent" #REQUIRED>
    <!ATTLIST ARGUMENTS arg1 IDREF prefix="patient" #REQUIRED>
    <!ATTLIST ARGUMENTS arg2 IDREF prefix="predicate" #REQUIRED>

Attribute Types

In MAE, an attribute value must be one of 4 types: CDATA, ID, IDREF, and closed value set.

ID means an attribute will work as the identifier for the tag. A tag can have only one ID type attribute, and in MAE the pre-defined id attribute will be the one.
CDATA means an attribute has a free text value.
IDREF means an attribute is referencing another tag. IDREF attribute must have an id of the referent. For example, argument id of a link tag will always be IDREF.
Finally, it is possible in MAE to have a set of options for an attribute value, rather than asking the annotators to fill in their own values each time. If you want to have a list of values, create the attribute and include a list of possible values enclosed with parentheses, delimited by |, like so:

    <!ATTLIST TagName attribute1 ( YES | NO | MAYBE ) #IMPLIED >

Required-ness

You can set an attribute to be mandatory or optional, using #REQUIRED and #IMPLIED respectively. Annotators are supposed to fill all required attribute as they work. If an annotation work file (XML) has unfulfilled mandatory attributes, MAE will warn the user about the holes when it loads up or saves the work.

Default Attribute Values

MAE allows you to set default values for any attribute by placing the desired value in quotes at the end of the attribute definition, like so:

    <!ATTLIST TagName attribute1 ( YES | NO ) #IMPLIED "YES">
    <!ATTLIST TagName attribute2 CDATA #IMPLIED "default">

Please note that if a list of options is defined in an attribute but the default value does not appear in the list, MAE will not provide that default value when creating a new tag. Also, remember that by giving an attribute a default value, annotators might skip annotating the attribute by mistake. Especially for required or important attributes, we highly recommend not to provide a default value only reduce human errors.

Putting All Together

Defining an attribute might end up with involving some complexity, as it has many properties to define:

    <!ATTLIST __TAG_NAME__ __ATT_NAME__ _VALUE_TYPE__ prefix __REQUIRE__ __DEF_VALUE__>

To summarize,

_TAG_NAME_ : the name of tag an attribute is associated
_ATT_NAME_ : the name of the attribute
_VALUE_TYPE_ : one of ID, CDATA, IDREF, or ( X | Y | ... )
prefix : only used to give argument name for a link tag
_REQUIRE_ : #REQUIRE or #IMPLIED
_DEF_VALUE_ : the default value, double-quoted

Full Spec Sample

Take a look at this [sample DTD] (https://github.com/keighrim/mae-annotation/blob/master/samples/sampleTask.dtd). Pay attention to comment lines in , as we tried to show examples of each variance of definitions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

defining dtd

MAE user's guide

Defining an Annotation Task

Task Name

Tag Elements

Attributes

`id` Attributes

`spans` Attribute

Example: Using Non-consuming (NC) extent tags to annotate timelines

`argN` Attributes

Attribute Types

Required-ness

Default Attribute Values

Putting All Together

Full Spec Sample

Clone this wiki locally

defining dtd

MAE user's guide

Defining an Annotation Task

Task Name

Tag Elements

Attributes

id Attributes

spans Attribute

Example: Using Non-consuming (NC) extent tags to annotate timelines

argN Attributes

Attribute Types

Required-ness

Default Attribute Values

Putting All Together

Full Spec Sample

Clone this wiki locally

`id` Attributes

`spans` Attribute

`argN` Attributes