Inflectible is a flexible template engine with inflection. It can use correct word forms where other template engines can't.
<dependency>
<groupId>org.tendiwa</groupId>
<artifactId>inflectible</artifactId>
<version>0.2.0</version>
</dependency>
Many natural languages rely heavily on non-trivial rules of inflection. In order to construct texts in those languages with variable members of sentences, we can't always just concatenate strings: generally we have to know the grammatical structure of sentences we're constructing, and we have to know how words in particular form are spelled. For example, in Russian, a typical noun can have up to a dozen forms that are written differently in different sentences, and there is no simple "cram-it-in-printf" rule for how those forms are derived from the dictionary form of a word.
In English it is not usually a problem. But even in English, sometimes just concatenating strings is not enough to produce a grammatically correct sentence.
Consider this example: we need to display a message that
some cutting tool cuts paper well. With something like printf
function,
we could use a template like this:
%s cuts paper well
We could pass "Knife"
or "Razor"
, but if we pass "Scissors"
, then it
produces a grammatically incorrect sentence "Scissors cuts paper well". This
is just the most basic example how properly constructed sentences require the
template engine to be aware of inflection rules.
Inflectible introduces two kinds of markup: vocabularies and templates.
In vocabularies, you put words of a language in all their various forms, and assign each form a grammatical meaning:
WOLF (Noun) {
wolf
wolves <Plur>
}
CHILD (Noun) {
child
children <Plur>
}
SCISSORS (Noun) <Plur> {
scissors
}
In templatuaries, you put templates. Templates declare arguments and describe how those arguments are used to fill out the template:
actions.bite(subject, object) {
[Subject] (and [subject]<Plur> are well known for their painful bites!) is biting a [object].
}
In your application, you have classes to represents the same concept that the
words of a language represent. Those classes would implement Concept
interface that require them to return the identifier of their lexeme:
class Wolf implements Concept {
@Override
public String identifier() {
return "WOLF";
}
}
With those classes, you construct a NativeSpeaker
that knows how to speak a
particular language using proper inflection rules, and ask him to fill out a
particular template with particular concepts:
Wolf wolf = new Wolf();
Human girl = new Human("GIRL");
System.out.printf(
nativeSpeaker.fillOut("actions.bite", wolf, girl);
);
// -> Output: Wolf (and wolves are known for their painful bites!) is biting a girl.
This may seem not very useful for English, but it makes a lot of sense e.g. in Russian, where a lexeme for НОЖ (KNIFE) would look like this:
НОЖ (Сущ) <Муж Неодуш> {
нож
ножа <Ед Р>
ножу <Ед Д>
нож <Ед В>
ножом <Ед Т>
ноже <Ед П>
ножи <Мн И>
ножей <Мн Р>
ножам <Мн Д>
ножи <Мн В>
ножами <Мн Т>
ножах <Мн П>
}
There are 12 different forms a word НОЖ can assume under different grammatical meanings, so choosing the correct one can become crucial.
Of course, it would be a pain to type all these words manually in a vocabulary markup. But the good news are that a machine can often guess with very high accuracy what would a particular word form would be, if we know the persistent grammatical meaning of a word and its dictionary form. Inflectible can generate those word forms for you, all you need to do is:
НОЖ (Сущ) <Муж Неодуш> {
нож
...
}
That's the actual markup, and if template engine sees it, it can automatically produce a lexeme equivalent to the previous tediously written example. It even supports suppletion!
ЧЕЛОВЕК (Сущ) <Муж Одуш> {
человек
люди <Мн>
людьми <Мн Т>
...
}
The goals for version 1.0.0 are:
- Full automated word form generation support for every part of speech in Russian and English;
- Flexible design that allows allows automating inflection in any flective language;
- Agreement with numbers (двух коней, два коня, две лошади, пять коней, один конь, миллион и двадцать один конь — that is the Russian for two horses, two male horses, two female horses, five horses, one horse, million and twenty one horses. Just look at all the different endings);
- Phonetic "agreement" (indefinite article "a"/"an" in English and "de/d'" in French depend not on grammatical features of another word, but on its phonetical features);
- Complete basic vocabularies for English and Russian — built-in vocabularies with the most common words, such as articles, pronouns and numbers. It wouldn't make sense to ask every user of the template engine to compose or copy their own vocabulary for the most basic words of a language.
- Multipart templates for the cases when you want to split the result of filling a template into logical parts;
- IntelliJ IDEA plugin for markup editing;
- Maven plugin for generating explicit lexemes from partially defined ones at build time.