This script visualizes alignments between two sentences.
Machine translation, paraphrasing systems often use alignments. Alignments can be automatically produced by systems such as GIZA++.
- Wondering if alignment can be useful for your applications (e.g. natural language processing systems)?
- Developing your own alignment algorithms, but have trouble debugging or visualizing the output?
- You have a new alignment algorithm that you want to write about, but drawing the diagram just takes too long?
This script visualizes the alignment structure between two sentences into a pretty picture, so that we can clearly see what's being aligned to which.
Input
- two sentences (e.g. one in English -- "I like machine translation .", and one in Chinese -- "我 喜欢 机器翻译 。")
- the alignment between them (e.g. the first English word is aligned with the first Chinese word etc.)
Output
- A pretty PNG image visualizing the two sentences and the alignments between them
This script is written in Python2. Once you have Python2 and the following dependent libraries installed, the installation of this software is just one click away -- You may simply download the draw_alignment.py file, and you're good to go.
The two dependent libraries are listed below
- Google flags library -- python-gflags
- Python Image Library -- PIL
###Analyzing Alignments produced by GIZA++ GIZA++ is an awesome software that produces word level alignments for a list of sentences and their translations. We use it all the time in our researches, but its alignment output only contains a list of numbers, which is hard to interprete.
We typically feed two files to GIZA++, containing sentences in one language, and their translations in another language. For example
test/Eng.txt
the government should not limit the amount spent on the aged because this problem is becoming more and more prevalent in singapore .
an ageing population or what has been coined the " silver tsunami" is a phenomena faced by developed countries around the globe and singapore is no exception .
...
test/Chs.txt
, 因为 这个 问题 变 得 越来越 普遍 , 在 新加坡 , 政府 不 应该 限制 对 老年 花费 的 金额 。
“ 银发 海啸 ” 的 现象 所 面临 的 发达国家 在 全球 各地 和 新加坡 的 人口 老龄化 已经 创造 也 不 例外 。
...
GIZA++ is able to produce the alignments between sentences in the two files.
test/Align.txt
2-14 12-2 13-3 21-10 8-16 3-13 20-9 15-4 1-12 5-19 6-20 11-1 10-17 0-11 4-15 19-7 22-21 7-18 17-8 15-6
21-12 17-9 7-2 9-3 20-10 13-4 9-0 12-4 11-1 0-15 5-18 2-16 1-17 26-22 24-15 23-14 22-13 16-6 6-18 16-8 15-7 14-5 27-23 25-20
...
My program helps you visualize the alignments. The following command will generate the alignment of the second sentence into output.png.
python draw_alignment.py --src_sentences=test/Eng.txt --trg_sentences=test/Chs.txt --align_file=test/Align.txt --sentence_id=1 --output_image=output.png
In the above case, output.png will contain the following image
###Using the Package as an External Library You want to debug your new alignment algorithm, but have trouble analyzing its output? You may call the DrawDirAlignToFile function to produce a pretty picture for your purpose.
Syntax: DrawDirAlignToFile(src_sentence, trg_sentence, alignments, output_imagefile)
The following code will generate the "I like machine translation" example.
import draw_alignment
draw_alignment.DrawDirAlignToFile("I like machine translation .", u"我 喜欢 机器翻译 。", [(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)], "test.png")