-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathXLIFF-EM-BP-wd01.html
480 lines (480 loc) · 95 KB
/
XLIFF-EM-BP-wd01.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
<html xmlns:ng="http://docbook.org/docbook-ng"><head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>XLIFF 2 Extraction and Merging Best Practice, Version 1.0</title><meta name="generator" content="DocBook XSL Stylesheets V1.79.2"><meta name="description" content="This Informational Best Practice specification targets designers of XLIFF Extracting and Merging tools for content owners. It gathers common problems that are prone to appear when Extracting XLIFF Documents from HTML, generic XML, or MarkDown. This specification shows why some Extraction approaches will cause issues during an XLIFF Roundtrip. This best practice guidance provides better thought through alternatives and shows how to use many of advanced XLIFF features for lossless Localization roundtrip of HTML and XML based content."><meta name="keywords" content="XLIFF, Extraction, Merging, Best Practice"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en-US" class="article"><div class="titlepage"><div><div><h1 class="title"><a name="d5e1"></a>XLIFF 2 Extraction and Merging Best Practice, Version 1.0</h1></div><div><div class="authorgroup"><div class="editor"><h4 class="editedby">Edited by</h4><h3 class="editor">David Filip</h3><div class="affiliation"><span class="orgname">ADAPT Centre<br></span></div><code class="email"><<a class="email" href="mailto:[email protected]">[email protected]</a>></code></div><div class="editor"><h3 class="editor">Ján Husarčík</h3><div class="affiliation"><span class="orgname">Moravia<br></span></div><code class="email"><<a class="email" href="mailto:[email protected]">[email protected]</a>></code></div><div class="othercredit"><h3 class="othercredit">Rodolfo M. Raya</h3></div><div class="othercredit"><h3 class="othercredit">Andreas Galambos</h3></div></div></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo"></p></div><div><p class="releaseinfo">TAPICC T1/WG3</p></div><div><p class="copyright">Copyright © 2018 GALA TAPICC. All rights reserved.</p></div><div><div class="legalnotice"><a name="d5e35"></a><p class="legalnotice-title"><b>Additional artifacts</b></p><p>This prose specification is one component of a Work Product that also includes:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Extraction and merging examples from <a class="link" href="https://galaglobal.github.io/TAPICC/T1/WG3/wd01/extraction_examples/" target="_top">https://galaglobal.github.io/TAPICC/T1/WG3/wd01/extraction_examples/</a>
</p><p>An unstable editorial version of the examples might exist at <a class="link" href="https://galaglobal.github.io/TAPICC/T1/WG3/extraction_examples/" target="_top">https://galaglobal.github.io/TAPICC/T1/WG3/extraction_examples/</a></p></li></ul></div></div></div><div><div class="legalnotice"><a name="d5e44"></a><p class="legalnotice-title"><b>Related work</b></p><p>This note provides an informative best practice for XLIFF 2 Specifications:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>XLIFF Version 2.1 [<span class="citation">[XLIFF-2.1]</span>]</p></li><li class="listitem"><p>XLIFF Version 2.0 [<span class="citation">[XLIFF-2.0]</span>]</p></li><li class="listitem"><p>ISO 21720:2017 [<span class="citation">[ISO XLIFF]</span>]</p></li></ul></div></div></div><div><div class="legalnotice"><a name="d5e65"></a><p class="legalnotice-title"><b>Status</b></p><p>This Informational Best Practice was last revised by TAPICC T1/WG3 or the TAPICC Steering
Committee on the above date. The level of approval is also listed above. Check the “Latest
version” location noted above for possible later revisions of this document.</p><p>Contributions to this deliverable or subsequent versions of this deliverable can be made
via the <a class="link" href="https://github.com/GALAglobal/TAPICC" target="_top">GALA TAPICC GitHub
Repository</a> subject to signing the <a class="link" href="https://www.gala-global.org/tapicc-legal-agreement" target="_top">TAPICC Legal
Agreement</a>.</p></div></div><div><div class="legalnotice"><a name="d5e71"></a><p class="legalnotice-title"><b>Citation format</b></p><p>When referencing this specification the following citation format should be used:</p><p>[<span class="citation">XLIFF-EM-BP</span>]</p><p><span class="emphasis"><em>XLIFF 2 Extraction and Merging Best Practice, Version 1.0</em></span>
Edited by David Filip and Ján Husarčík. 26 June 2018. Working Draft 01. https://galaglobal.github.io/TAPICC/T1/WG3/wd01/XLIFF-EM-BP-V1.0-wd01.html.
Latest version: N/A.html.</p></div></div><div><div class="legalnotice"><a name="d5e78"></a><p class="legalnotice-title"><b>Notices</b></p><p>Copyright © GALA TAPICC 2018. All rights reserved.</p><p> The Translation API Cases and Classes (TAPICC) initiative is a collaborative,
community-driven, open source project to advance API standards in the Localization industry.
The overall purpose of this project is to provide a metadata and API framework on which
users can base their integration, automation and interoperability efforts.</p><p>The usage of all deliverables of this initiative - including this specification - is
subject to open source license terms expressed in the BSD-3-Clause License and CC-BY 2.0
License, the declared applicable licenses when the project was chartered. </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>The 3-Clause BSD License (BSD-3 Clause): <a class="link" href="https://opensource.org/licenses/BSD-3-Clause" target="_top">https://opensource.org/licenses/BSD-3-Clause</a></p></li><li class="listitem"><p>Creative Commons Legal Code (CC-BY 2.0): <a class="link" href="https://creativecommons.org/licenses/by/2.0/legalcode" target="_top">https://creativecommons.org/licenses/by/2.0/legalcode</a></p></li></ul></div></div></div><div><p class="pubdate">26 June 2018</p></div><div><div class="abstract"><p class="title"><b>Abstract</b></p><p>This Informational Best Practice specification targets designers of XLIFF
<em class="firstterm">Extracting</em> and <em class="firstterm">Merging</em> tools for content
owners. It gathers common problems that are prone to appear when
<em class="firstterm">Extracting</em>
<em class="firstterm">XLIFF Documents</em> from HTML, generic XML, or MarkDown. This
specification shows why some <em class="firstterm">Extraction</em> approaches will cause issues
during an <em class="firstterm">XLIFF Roundtrip</em>. This best practice guidance provides
better thought through alternatives and shows how to use many of advanced XLIFF features for
lossless Localization roundtrip of HTML and XML based content.</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="glossary"><a href="#Glossary">Terminology and Concepts</a></span></dt><dt><span class="section"><a href="#Intro">Introduction</a></span></dt><dt><span class="section"><a href="#Spec">Specification</a></span></dt><dd><dl><dt><span class="section"><a href="#XLIFF_Structure">XLIFF Structure</a></span></dt><dt><span class="section"><a href="#Inlines">Inline Codes</a></span></dt><dt><span class="section"><a href="#Target_content">Target Content in Extracted XLIFF</a></span></dt><dt><span class="section"><a href="#Hints">Editing and Context Hints</a></span></dt><dt><span class="section"><a href="#miscellaneous">Miscellaneous</a></span></dt><dt><span class="section"><a href="#validations">XLIFF Validations</a></span></dt></dl></dd><dt><span class="section"><a href="#Summary">Summary</a></span></dt><dt><span class="bibliography"><a href="#references">References</a></span></dt></dl></div><div class="glossary"><div class="titlepage"><div><div><h2 class="title"><a name="Glossary"></a>Terminology and Concepts</h2></div></div></div><p>Apart form terminology and concepts defined here, this specification makes heavy use of
terms defined in the XLIFF Standards [<span class="citation">[XLIFF-2.1]</span>] such as:
<em class="firstterm">Extractor</em>, <em class="firstterm">Merger</em>,
<em class="firstterm">Translation</em>, <em class="firstterm">XLIFF Document</em>,
<em class="firstterm">XLIFF-defined</em>, etc.</p><dl><dt><span class="glossterm">context hints</span></dt><dd class="glossdef"><p><em class="firstterm">XLIFF-defined</em> attributes on structural or inline elements
providing additional contexts, such as <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#disp" target="_top">disp</a></code> or <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#equiv" target="_top">equiv</a></code>. Attributes <code class="code">fs</code> and <code class="code">subFs</code> defined in the
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#fs-mod" target="_top">XLIFF Format Style Module</a> are also considered <em class="firstterm">context
hints</em>.</p></dd><dt><span class="glossterm">inline codes</span></dt><dd class="glossdef"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#sc" target="_top"><code class="code"><sc/></code></a>/<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#ec" target="_top"><code class="code"><ec></code></a> pairs, orphaned <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#sc" target="_top"><code class="code"><sc/></code></a> or <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#ec" target="_top"><code class="code"><ec/></code></a>, well formed <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#pc" target="_top"><code class="code"><pc></code></a>, standalone <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#ph" target="_top"><code class="code"><ph/></code></a> and <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#cp" target="_top"><code class="code"><cp></code></a> are <em class="firstterm">inline codes</em> used to represent
native format inline markup in <em class="firstterm">XLIFF Documents</em>.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2019/frameworks/docbook/css/img/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p><em class="firstterm">Inline codes</em> can reference original data in the
<em class="firstterm">XLIFF Core</em>
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#originaldata" target="_top"><originalData></a> standoff element. </p></td></tr></table></div></dd><dt><span class="glossterm">markers</span></dt><dd class="glossdef"><p><code class="code"><sm/></code>/<code class="code"><em/></code> pairs and well
formed <code class="code"><mrk></code> are <em class="firstterm">XLIFF-defined</em> inline
marker elements designed for inline annotations of content with metadata. </p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2019/frameworks/docbook/css/img/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p><em class="firstterm">Markers</em> are distinct from <em class="firstterm">inline
codes</em> (see). Markers can be combined with <em class="firstterm">Module</em>
or Extension based standoff elements for rich metadata that would be complicated or
impossible to display inline.</p></td></tr></table></div></dd></dl></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="Intro"></a>Introduction</h2></div></div></div><p>This Informational Best Practice targets designers of XLIFF <em class="firstterm">Extracting</em> and
<em class="firstterm">Merging</em> Tools for content owners. <em class="firstterm">XLIFF
Roundtrip</em> designers of all kinds will benefit, no matter if they design their
<em class="firstterm">XLIFF Extractor/Merger</em> for corporate or blog use.</p><p><em class="firstterm">Extraction</em> and <em class="firstterm">Merging</em> behavior is out of
the normative scope of OASIS XLIFF Specifications. Although those specifications do provide
some guidance for <em class="firstterm">Extractor</em> and <em class="firstterm">Merger Agents</em>,
XLIFF TC did not attempt to prescribe how exactly to use XLIFF to represent native content.
This is mostly because XLIFF is a native-format-agnostic Localization Interchange
Format.</p><p>This specification gathers common problems that are prone to appear when
<em class="firstterm">Extracting</em>
<em class="firstterm">XLIFF Documents</em> from HTML, generic XML, or MarkDown. This specification
shows why some <em class="firstterm">Extraction</em> approaches will cause issues during an
<em class="firstterm">XLIFF Roundtrip</em>, issues often so severe that
<em class="firstterm">Merging</em> back of target content will not be possible without costly
post-processing or could fail utterly. This best practice guidance provides better thought
through alternatives and shows how to use many of advanced XLIFF features for lossless
Localization roundtrip of HTML and XML based content. Often, there are no ultimate prescribed
solutions, rather possible design goals are described and best methods how to achieve them
proposed.</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="Spec"></a>Specification</h2></div></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="XLIFF_Structure"></a>XLIFF Structure</h3></div></div></div><p>Taking time to consider not only what to <em class="firstterm">Extract</em>, but how to
<em class="firstterm">Extract</em> it, and how to structure the <em class="firstterm">XLIFF
Document</em> can significantly reduce number of issues during the roundtrip and
enable usage of additional features offered by the XLIFF Standards.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="group"></a>File Structure</h4></div></div></div><p>Native formats can contain structural elements dividing its content into parts, such
as title, body, header and footer, or tables, lists and divs for markup languages; or
windows, dialogs, and menus for software resources. </p><p>Representing native structural elements in XLIFF using potentially nested <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#group" target="_top"><group></a></code> elements can be useful for providing, and correctly scoping: </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>additional context (<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#name" target="_top">name</a></code>, <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#type" target="_top">type</a></code>, <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#subtype" target="_top">subType</a></code>, attributes from <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#fs-mod" target="_top">Format Style Module</a>)</p></li><li class="listitem"><p>restrictions (<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canResegment" target="_top"><code class="code">canResegment</code></a>,
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#translate" target="_top"><code class="code">translate</code></a>, attributes from <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#size_restriction_module" target="_top">Size and Length Restriction Module</a>)</p></li><li class="listitem"><p>whitespace handling (<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#xml_space" target="_top">xml:space</a></code>)</p></li><li class="listitem"><p>information from modules such as:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: circle; "><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#metadata_module" target="_top">Metadata</a></p></li><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#validation_module" target="_top">Validation</a></p></li><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ITS-module" target="_top">ITS</a></p></li></ul></div></li></ul></div><p>Most of the above can still be achieved without using the optional <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#group" target="_top"><group></a></code> elements. It will be however at the cost of high redundancy
of unit level metadata and possibly cause potentially illegal overload of some of the
<em class="firstterm">XLIFF Core</em> features. </p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/group" target="_top">group</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="unit"></a>Role of the <code class="code"><unit></code> Element</h4></div></div></div><p><em class="firstterm">Extractors</em> set the XLIFF structure, which cannot be modified
(an absolute prohibition expressed in the XLIFF Standards) during the roundtrip at the
unit level or higher. Ensuring or not that the appropriate relationships drive
<em class="firstterm">Extraction</em> from structures of the native format into XLIFF
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code> elements can make all the difference between hindering or
crippling the roundtrip and making the most of XLIFF features in a compliant way.</p><p>Severe problems can be caused by both extremes: too many or not enough <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code> elements. Especially dangerous is Extracting
every segment as a separate <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code> element as this will effectively prevent
<em class="firstterm">Modifiers</em> downstream to change segmentation. Changing
segmentation within logically self contained units is one of the key advantages of the
XLIFF 2 structural data model that makes a distinction between the immutable high level
structure (<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code> and higher) and the transient segmentation structure (<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> elements within each <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code>) that interplays with the inline data model and the inline
annotations' logic.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/mapping_to_unit" target="_top">mapping_to_unit</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="canResegment"></a>Controlling Segmentation</h4></div></div></div><p>Depending on <em class="firstterm">Extraction</em> rules for mapping of original document
structures into <em class="firstterm">XLIFF Documents</em>, individual sentences within a
paragraph, verses within a stanza, items or entries of a list, rows, or cells of a table,
items of a dialog window and so on might be <em class="firstterm">Extracted</em> as segments
of a single unit. While it is generally not advisable to perform segmentation at the time
of <em class="firstterm">Extraction</em>, <em class="firstterm">Extractors</em> that
<em class="firstterm">Extracted</em> multiple sentences, verses, entries, rows and so on
into a single non-segmented unit (a single <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> element within each <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code>) and their corresponding <em class="firstterm">Mergers</em> need
to expect that the <em class="firstterm">Modifiers</em> will need to transform them into
individual segments within the same unit (multiple <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> elements representing individual sentences, verses
and so on within each <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#unit" target="_top"><unit></a></code>) during the roundtrip.</p><p>In cases where subsequent <em class="firstterm">Modifiers</em> cannot be reasonably
expected to detect the segmentation logic, for instance due to the lack of knowledge of
the original format logic, the content owner is advised to perform the segmentation and
protection of that segmentation before sending their XLIFF Documents for the service
roundtrip.</p><p>While it's generally desirable to be able to <em class="firstterm">Modify</em>
segmentation within a unit during the roundtrip, doing so in some of the above cases might
prevent <em class="firstterm">Merging</em>, cause build issues, or have negative impact on
target product user experience.</p><p>Attribute <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canresegment" target="_top">canResegment</a></code> can be used with care to control segmentation
<em class="firstterm">Modification</em> behavior. It’s values need to be controlled by rules
derived from the structural and inline logic of the native format, for instance more often
than not it will make sense to set <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canresegment" target="_top">canResegment</a></code> to <code class="code">no</code> for:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>lists</p></li><li class="listitem"><p>tables or table rows</p></li><li class="listitem"><p>UI elements</p></li></ul></div><p>In UI elements and tables, it is likely that the available segmentation needs
protected, on the other hand, it is advisable not to change the default <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canresegment" target="_top">canResegment</a>="yes"</code> for normal paragraph text and similar, see <a class="link" href="#unit" title="Role of the <unit> Element">Role of the <unit> Element</a>. </p><p>Importantly, preventing <em class="firstterm">Modification</em> of segmentation using the
attribute <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canresegment" target="_top">canResegment</a></code> (set to <code class="code">no</code> when necessary) will
<span class="emphasis"><em>not</em></span> prevent reordering of segments within a unit using the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#order" target="_top"><code class="code">order</code></a> attribute on the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><code class="code"><target></code></a> elements within the same unit. So in case an ordered
list needs to be for instance alphabetically collated, translators can do so even in case
the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canresegment" target="_top">canResegment</a></code> attribute is set to <code class="code">no</code>. The segmentation
logic of the native format remains protected without preventing collation. This would be
all hampered if the <em class="firstterm">Extractor</em> decided to
<em class="firstterm">Extract</em> each segment as a separate unit, which is the most evil
practice that cannot be discouraged enough.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/mapping_to_unit" target="_top">mapping_to_unit</a>. </p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="Inlines"></a>Inline Codes</h3></div></div></div><p>Guidance for processing standalone and spanning inline functional and formatting
elements of localizable content.</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Perform complete extraction</p></li><li class="listitem"><p>Represent spanning code using <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> (or <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></pc></a></code> where possible)</p></li><li class="listitem"><p>Represent standalone code using <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code></p></li><li class="listitem"><p>Include all (even the outermost) <em class="firstterm">inline codes</em> in the
<em class="firstterm">Extracted</em> content</p></li><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html" target="_top">XLIFF2 prose</a>.</p></li></ul></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="Spanning"></a>Representing Spanning Codes</h4></div></div></div><p><em class="firstterm">Spanning codes</em> in the original format are created by an opening
code, the content, and the closing code. In HTML that can be
<code class="code"><bold>text</bold></code>, in RTF <code class="code">\b text \b0</code>.</p><p>In <em class="firstterm">XLIFF Documents</em>, such code can be always represented with an
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> pair, or with spanning <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></pc></a></code>, only for well formed markup.</p><p>Ideally, the original format is documented well enough to instruct
<em class="firstterm">Extractors</em> about the role of each <em class="firstterm">inline
code</em>. For example, XML Schema allows to declare elements using the keyword
EMPTY. This way, all elements that are not declared EMPTY, can be represented as described
above. To further help the <em class="firstterm">Extraction</em> process, the following
recommendation could be implemented in the original XML format: <span class="quote">“<span class="quote">For
interoperability, the empty-element tag SHOULD be used, and SHOULD only be used, for
elements which are declared EMPTY.</span>”</span>[<span class="citation">[XML]</span>].</p><p>Following this recommendation of the XML specification, an empty
<code class="code"><span></code> ought to be encoded as <code class="code"><span></span></code> and
therefore represented as an <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> pair in <em class="firstterm">XLIFF Documents</em>, unlike the
always empty <code class="code"><br/></code> that has to be represented as a standalone placeholder
code <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph/></a></code>.</p><p>This concept is illustrated by the <span class="emphasis"><em>bad practice</em></span> example <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/spanning_as_ph" target="_top"><code class="code">spanning_as_ph</code></a>.</p><p>This kind of bad practice encoding doesn't inform the
<em class="firstterm">Translating</em>
<em class="firstterm">Agent</em> (human or machine <em class="firstterm">Modifier</em>) that the
original code formed a span and effectively the original spanning code is not protected
during the roundtrip. The standalone <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph/></a></code> codes can be switched or one of them omitted; simply, the span
is likely to end up misplaced, malformed, or empty simply because the
<em class="firstterm">Translation</em> editor cannot convey to the translator that the codes
need to enclose a certain portion of the original content and what is the semantics of the
original code span.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="Outermost"></a>Outermost Tag Pairs</h4></div></div></div><p>In some cases, the <em class="firstterm">inline codes</em> can enclose the localizable
string in a way that could suggest omitting them in the <em class="firstterm">Extracted</em>
text. For example, a paragraph containing only a link text, could be
<em class="firstterm">Extracted</em> as the link text only, without the
<code class="code"><a></a></code> decoration being represented. This relates to the previous
<span class="emphasis"><em>bad practice</em></span>
<a class="link" href="https://github.com/GALAglobal/TAPICC/blob/master/extraction_examples/spanning_as_ph/bad_opening_excluded.xlf" target="_top">example</a> with the spanning tag represented as two empty <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph/></a></code> elements.</p><p>In case the <code class="code"><a></a></code> decoration is not represented, the translator
loses valuable context (they cannot check the link), more importantly they don't know that
the text is a link text, and moreover are unable to add any text outside of the link span,
which might be advisable or even mandatory in certain locales.</p><p>Ideally, a consistent approach to all <em class="firstterm">inline codes</em> ought to be
used during <em class="firstterm">Extraction</em>.</p><p>See the relevant <span class="emphasis"><em>bad practice</em></span> example <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/outermost_inline_excluded" target="_top">outermost_inline_excluded</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="CDATA"></a>Incomplete Extraction of Inline Codes</h4></div></div></div><p>Some implementers choose not to <em class="firstterm">Extract</em>
<em class="firstterm">inline codes</em> at all and use instead one of the following approaches:</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>CDATA sections as content of <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#source" target="_top"><source></a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements (<a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/cdata" target="_top">cdata</a>)</p></li><li class="listitem"><p>Escaping of native codes using XML entities (<a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/inline_codes_plain_text" target="_top">inline_codes_plain_text</a>)</p></li></ul></div><p>Doing one of the above can be used as a useful <span class="emphasis"><em>interim</em></span>
<em class="firstterm">Extraction</em> step when producing <em class="firstterm">XLIFF
Documents</em> that are fit for roundtrip. However, it is strongly discouraged to
send <em class="firstterm">XLIFF Documents</em> with the inline content not fully parsed for
Localization roundtrip.</p><p>Such incomplete <em class="firstterm">Extraction</em> leaves <em class="firstterm">inline
codes</em> unprotected and increases the risk of their corruption during the
roundtrip, simply pushing the problem of <em class="firstterm">inline code</em> handling
downstream.</p><p>According to the XLIFF [<span class="citation">[XLIFF-2.1]</span>] Standards,
<em class="firstterm">Modifiers</em> can perform secondary parsing:</p><p><span class="quote">“<span class="quote"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#d0e8112" target="_top">Writers <span class="emphasis"><em>may</em></span> preserve original CDATA sections</a></span>”</span>
(meaning that it is entirely optional to preserve CDATA sections and that <em class="firstterm">XLIFF
Writers</em> are not obliged to preserve CDATA sections) and</p><p>
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#d0e8993" target="_top">text can be converted into inline codes</a>.</p><p><em class="firstterm">Mergers</em>
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ApplicationConformance" target="_top">have to accept XLIFF files with valid modifications</a>, even though <a class="link" href="addingcodeshttp://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#addingcodes" target="_top">they may ignore the added codes</a>.</p><p>Finally, it is considered an <a class="link" href="https://www.w3.org/TR/xml-i18n-bp/#AuthCDATA" target="_top">XML internationalization best
practice</a> to avoid CDATA sections in localizable content. This best practice is of
course also valid for XLIFF <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#source" target="_top"><source></a></code> elements.</p><p>Strictly speaking, it is not illegal to create <em class="firstterm">XLIFF Documents</em>
that contain CDATA sections or unparsed entities instead of fully parsed XLIFF inline
content. However, considering all of the above, it is clear that unparsed inline content
makes XLIFF Documents unfit for a fully interoperable roundtrip. Again, strictly speaking,
<em class="firstterm">XLIFF Documents</em> with unparsed inline content are
<span class="emphasis"><em>capable</em></span> of roundtrip but all the effort that is saved on
<em class="firstterm">Extraction</em> will cause unpredictable issues and hence even more
effort when <em class="firstterm">Merging</em> back. </p><p>Implementers need to consider that <em class="firstterm">XLIFF Documents</em> with
unparsed inline content are very likely to return with critical inline syntax or
formatting corruptions that cannot be prevented on CDATA sections or entities that are
both opaque to <em class="firstterm">XLIFF Modifiers</em>. Such corruptions are likely to
prevent proper functionality of target content in the native environment. In case
<em class="firstterm">XLIFF Modifiers</em> do perform the secondary parsing of content
unparsed on <em class="firstterm">Extraction</em>, which is allowed by the standard,
corruption will be prevented, however, <em class="firstterm">Mergers</em> will need to perform
<span class="emphasis"><em>unparsing</em></span> to facilitate merging back into the native environment,
because XLIFF Modifiers are not and cannot be obliged to <span class="emphasis"><em>unparse</em></span> back
to CDATA sections or entities not knowing the <em class="firstterm">Extraction</em> and
<em class="firstterm">Merging</em> logic of the <em class="firstterm">XLIFF Document</em>
originator. </p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="Multiples"></a>Representing Multiple Subsequent Codes</h4></div></div></div><p>As original <em class="firstterm">inline codes</em> can occur in clusters, for instance as
nested formatting, implementers could be tempted to combine such markup on
<em class="firstterm">Extraction</em> and represent it as a single inline element.</p><p>This kind of <em class="firstterm">Extraction</em> is likely to prevent potentially
desirable <em class="firstterm">Modification</em> of <span class="emphasis"><em>inline codes</em></span>,
affecting <em class="firstterm">Translation</em> quality. It will also prevent usage of fine
grained code metadata (for instance context, display, and editing hints) or automated
format validation during the roundtrip.</p><p>On the other hand, some potential benefit can be perceived in reducing markup inside
segment content, which is useful in CAT tools that cannot properly display the inline
codes (reneder information available through original data or context hints). In such
tolls, less markup reduces the visual clutter and makes the translatable text more
readable. This can be solved by proper choice of CAT tools (short term) or by large buyers
requesting that offending tool vendors do support proper rendering of <em class="firstterm">inline
code</em> data and metadata (mid and long term).</p><p>Implementers need to consider the pros and cons of both approaches and use the one
that best matches their business need.</p><p>For examples see <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/multiple_codes_represented_as_single" target="_top">multiple_codes_represented_as_single</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="Target_content"></a>Target Content in Extracted XLIFF</h3></div></div></div><p>This section focuses on reasons whether or not to populate the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> element during <em class="firstterm">Extraction</em> or
<em class="firstterm">Enriching</em> and when to do so, if at all.</p><p>Generally, one should omit the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> element, unless there is an added value and also in cases
where the specification offers another dedicated solution. Proper support of the <a class="link" href="#State_Machine" title="State Machine">state
machine</a> during the whole roundtrip helps
<em class="firstterm">Agents</em> to process and validate the <em class="firstterm">XLIFF
Documents</em> as intended.</p><p>When looking at the situation from the <span class="emphasis"><em>microservices</em></span> point of view,
the <em class="firstterm">Extractor</em>/<em class="firstterm">Merger</em> ought to be implemented
as just that — a single purpose
<em class="firstterm">Extraction</em>/<em class="firstterm">Merging</em> service that delegates
any other operations, such as segmentation or <em class="firstterm">Enriching</em> to other
specialized services.</p><p>Output of such extractor would be a <span class="emphasis"><em>target language</em></span> agnostic
<em class="firstterm">XLIFF Document</em> with source content only, possibly with additional
modules/extensions which could not be generated after extraction, for example <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#size_restriction_module" target="_top">Size and Length Restriction Module</a> or <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#fs-mod" target="_top">Format Style Module</a>.</p><p>Unless the implementer has a specific need to create <span class="emphasis"><em>target
language</em></span> specific instances of the extracted <em class="firstterm">XLIFF
Document</em>, for instance by <em class="firstterm">Enriching</em> with
<a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#candidates" target="_top">translation candidates</a>, the
<em class="firstterm">Extracted</em>
<em class="firstterm">XLIFF Document</em> could and ought to be sent downstream for the
Localization roundtrip as-is.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="Source_copy_in_target"></a>Inserting Source Content into <code class="code"><target></code></h4></div></div></div><p>The copy of the source content in <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements generally does not provide any advantage during
the XLIFF roundtrip. On the contrary, it brings disadvantages such as needlessly
increasing the size of the <em class="firstterm">XLIFF Document</em> or enforcing existence of
the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#trgLang" target="_top"><code class="code">trgLang</code></a> attribute with a specific BCP 47 compliant value.
Populated <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements are also likely to prevent segmentation
modification, unless the target content is intentionally removed (which service providers
are understandably hesitant to do). Not the least issue is that the source content copied
to the target actually is <span class="emphasis"><em>not</em></span> in the target language indicated by the
BCP 47 tag on the XLIFF root element, which can cause a host of other processing issues.
The <span class="emphasis"><em>bad practice</em></span> of populating <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements with source content used to facilitate parsing and
editing of XLIFF in <span class="emphasis"><em>Translation</em></span> editors or generic XML editors that
didn't have XLIFF support and could only open for translation certain elements in generic
XML formats. As such, this practice is strongly discouraged.</p><p><span class="emphasis"><em>Bad practice</em></span> example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/source_in_target" target="_top">source_in_target</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="Candidates_in_target"></a>Inserting Possible Translations into <code class="code"><target></code> elements</h4></div></div></div><p><em class="firstterm">Enriching Agents</em> can use translation memories, machine
translation engines, or other means to obtain suitable translation candidate strings in
the target language to be used later in the roundtrip, for example as suggestions for
translators, to achieve better leverage, or to get higher consistency with previous
translations.</p><p>Using the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> element for storing such translation candidates limits the
number of the possible proposed translations to a single one per segment. Moreover, this
way it's not possible to pass critical metadata about the translation candidate, such as
its origin, similarity, or quality (all those are available in a dedicated module),
causing interoperability issues for <em class="firstterm">Agents</em> without prior knowledge
of the workflow.</p><p>Inserting translation candidates into <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements during <em class="firstterm">Extraction</em> or
<em class="firstterm">Enriching</em> constitutes an illegal overload of the core element
with a clearly set purpose. The <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#candidates" target="_top">Translation Candidates Module</a> was designed exactly to provide translators with
multiple translation candidates along with metadata that facilitate decision making and
effective reuse.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/pre-populated_target" target="_top">pre-populated_target</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="State_Machine"></a> State Machine</h4></div></div></div><p>The XLIFF specification contains attributes for managing a segment state machine. The
attributes used are <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top"><state></a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#substate" target="_top"><subState></a></code>. The <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#substate" target="_top"><subState></a></code> attribute can only be used as long as the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top"><state></a></code> attribute is used. The <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top"><state></a></code> attribute is for high level interoperability. The <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#substate" target="_top"><code class="code"><subState></code></a> attribute allows implementers to define private
sub-state machines that can give more fine-grained sub-states based on their private
workflow needs. </p><p> The <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top"><state></a></code> attribute defines just a high level four states state
engine. The values are <code class="code">initial</code>, <code class="code">translated</code>,
<code class="code">reviewed</code>, and <code class="code">final</code>. Although this attribute is
<em class="glossterm">optional</em> on the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> element, it is assumed as having the default value
<code class="code">initial</code> whenever not used explicitly. There are some important advantages
to using the state machine explicitly. Importantly, <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements are optional in the <code class="code">initial</code> state. So
if you want to even enforce <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> existence in your deliverables you should be using at least
the high level four states state engine provided by the <em class="firstterm">Core</em>
attribute <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top">state</a></code>. Setting the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#state" target="_top">state</a></code> attribute of a <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> to <code class="code">translated</code> or later does enforce
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> existence within that segment. </p><p>Using the high level states <code class="code">reviewed</code> and <code class="code">final</code> gives you
even more control over the progressive validation of the <em class="firstterm">XLIFF
Documents</em> you're roundtripping. All of the states <code class="code">translated</code>,
<code class="code">reviewed</code>, and <code class="code">final</code> will trigger validation of the inline data
model within <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#target" target="_top"><target></a></code> elements, which is not being validated in the
<code class="code">initial</code> state where even the existence of <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><target></a></code> elements is not assumed. Violations of the inline data
model including Editing hints are being tested in all states more advanced than
<code class="code">initial</code>. Those violations are considered "Warnings" at
<code class="code">translated</code> and <code class="code">reviewed</code> states. Only in the <code class="code">final</code>
state, those violations will become actual "Errors" that render the <em class="firstterm">XLIFF
Document</em> invalid.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="Hints"></a>Editing and Context Hints</h3></div></div></div><p>The XLIFF specification provides a number of attributes that allow to manage the
behavior and validation of structural and inline elements; such as controlling the
localizability of text; protecting non-deletable inline codes, or preserving their order;
controlling the segmentation modification; or providing additional context to other agents
downstream.</p><p>The default values of the editing hints and potential need to set them otherwise need to
be considered when creating <em class="firstterm">Extraction</em> rules to prevent issues which
can be only identified by automated validation with editing hints set as intended.</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="canDelete"></a>Non-deletable Inline Codes</h4></div></div></div><p>Original source text can contain functional inline codes apert from formatting ones,
such as software placeholders to be replaced during runtime. Removal of these placeholders
or other functional code, either intentional or accidental, during the
<em class="firstterm">Translation</em> roundtrip can produce valid <em class="firstterm">XLIFF
Documents</em> that will nevertheless fail to merge back, cause build failures
later on, or create other functional issues in the <em class="firstterm">Translated</em>
product.</p><p>The XLIFF specification provides the editing hint <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#candelete" target="_top">canDelete</a></code> with its default value set to <code class="code">yes</code> that is thus
automatically used or can be explicitly set on any inline code. For most of the formatting
codes, the default value <code class="code">yes</code> is fine, so that there is no need to set the
attribute explicitly most of the times. The default value means that the codes can be
removed during localization as the translators see fit. A typical example is the need to
remove italics or bold formatting codes in Chinese or Japanese target content. These
languages don't use typographical methods of emphasis and non-deletable formatting codes
tend to complicate life of translators into such languages. On the other hand,
<em class="firstterm">Extractors</em> need to take care to set the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#candelete" target="_top">canDelete</a></code> attribute to <code class="code">no</code> explicitly whenever an inline
code is critical for <em class="firstterm">Merging</em> back of the <em class="firstterm">XLIFF
Document</em>, their build process, or product functionality.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="canReorder"></a> Preserving Order of Codes</h4></div></div></div><p>In case the order or nesting of inline codes in the original document is prescribed
(for instance by a schema), it has to be preserved in the target content during the
localization roundtrip to prevent <em class="firstterm">Merge</em> issues, or validation fails
after <em class="firstterm">Merging</em>.</p><p>The attribute <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canreorder" target="_top">canReorder</a></code> on the inline code determines, whether each code can be
moved before, or after another code. Again, the default value of this attribute is
<code class="code">yes</code> meaning that the inline codes can be reordered as the translators see
fit.</p><p>This attribute is used to create and protect non-reorderable sequences of inline codes
if necessary for proper inline code functionality. </p><div class="example"><a name="d5e625"></a><p class="title"><b>Example 1. Example of a non-reorderable source sequence of inline codes</b></p><div class="example-contents"><pre class="programlisting">...
<source><pc id="1" canCopy="no" canDelete="no" canReorder="firstNo">
<pc id="2" canCopy="no" canDelete="no" canReorder="no">this is linktext</pc>
<ph id="3" canCopy="no" canDelete="no" canReorder="no"/>
</pc>
</source>
... </pre></div></div><br class="example-break"><p>Since this attribute is supported by native XLIFF validation artifacts
(<em class="firstterm">XLIFF Core</em> Schematron Schemas), potential reordering of the
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc/></a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph/></a></code> tags in the corresponding <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#target" target="_top"><code class="code"><target></code></a> element will be called out when validating <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><code class="code"><segment></code></a> elements with the state more advanced than
<code class="code">initial</code>. See also the <a class="link" href="#State_Machine" title="State Machine">Sate Machine</a>
section.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/editing_hints_canReorder" target="_top">editing_hints_canReorder</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="context"></a>Providing Context</h4></div></div></div><p>The <em class="firstterm">Agents</em> in the roundtrip, human or machines, need enough
information to make appropriate decisions regarding operations on inline codes, and how
the codes impact the adequacy and fluency of the target text, the context in short.</p><p>These additional metadata can be provided using the <em class="firstterm">context
hints</em> attributes: <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#disp" target="_top">disp</a></code> (<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#dispstart" target="_top">dispStart</a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#dispend" target="_top">dispEnd</a></code>), <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#equiv" target="_top">equiv</a></code> (<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#equivstart" target="_top">equivStart</a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#equivend" target="_top">equivEnd</a></code>), <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#type" target="_top">type</a></code>, and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#subtype" target="_top">subType</a></code>.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/context_hints" target="_top">context_hints</a>.</p><p>[need an explanation of the relationship between @disp, @equiv, and <original
data>]</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="canOverlap"></a>Considerations for Using Spanning Codes</h4></div></div></div><p>Compared to the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code> pair tag, which can be only used to represent well-formed
spanning codes within a single <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#segment" target="_top"><segment></a></code>, the more universal <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> pair can handle segmentation changes, span across segments,
other codes, or annotations, and even represent orphaned native inline codes. Sounds
great, so why use <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code>at all? </p><p>The fact that <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> pairs do support for overlapping codes will, however, create
an issue in a situation where <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec></a></code> pairs are used to represent multiple well-formed spanning codes
without setting their <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canoverlap" target="_top">canOverlap</a></code> attribute to <code class="code">no</code>. In such cases, the
well-formedness of the original codes is not protected and can be corrupted during the
roundtrip. It will be impossible to prevent this corruption with native XLIFF methods
unless the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canoverlap" target="_top">canOverlap</a></code> and possibly <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#canoverlap" target="_top">canReorder</a></code> attributes are properly set. So in case of representing
well-formed native markup, using the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code> pair tag is likely to be easier for the
<em class="firstterm">Extactor</em>. On the other hand, it is important to consider that
transforming the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code> pair tag into an <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> pair is always allowed. So the <em class="firstterm">Mergers</em>
need to prepared to handle <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc/></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec/></a></code> even in case their corresponding
<em class="firstterm">Extractor</em> used only the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code> pair tag.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/editing_hints_canOverlap" target="_top">editing_hints_canOverlap</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="miscellaneous"></a>Miscellaneous</h3></div></div></div><p>There are many other good or bad practice concepts that do not belong to any of the
above categories. Some of them are listed in this section:</p><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="id_attribute"></a>Value of attribute <code class="code">id</code></h4></div></div></div><p>Implementers could be tempted to store values of resource Ids in the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#id" target="_top">id</a></code> attribute of XLIFF structural or elements. While the XLIFF
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#id" target="_top">id</a></code> attribute value is restricted to <em class="firstterm">NMTOKEN</em> by
the normative XML Schema, the native format could not have such restrictions. Invalid
characters in XLIFF <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#id" target="_top">id</a></code> attributes will render the whole <em class="firstterm">XLIFF
Document</em> invalid. Although this would typically be discovered as soon as the
first validation occurs, it can still be costly to fix in a large or long running
project.</p><p>Instead, the XLIFF <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#name" target="_top">name</a></code> attribute is designed to store the original identifier of the
resource, it can be any string without restrictions. On the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#file" target="_top"><file></a></code> element, the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#original" target="_top">original</a></code> attribute can be used for the same purpose.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/id_and_name" target="_top">id_and_name</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="whitespaces"></a>Whitespace Handling</h4></div></div></div><p>Whitespaces can be important inside nodes such as <code class="code"><pre></code>, containing
for instance code samples, and <em class="firstterm">Modifying</em> them during the roundtrip
is not desirable.</p><p>Whitespaces (more than one of the whitespace characters in a sequence) are, however,
generally insignificant in the text nodes of markup formats, such as XML or HTML, and can
be changed anytime (even not intentionally as an XLIFF transform) by, for example,
reformatting and indentation (so called pretty-printing) without affecting the layout of
the rendered document.</p><p>Thus one cannot indiscriminately either preserve, or normalize. Since most of TMS and
CAT tools penalize whitespace discrepancies, the leverage could be negatively affected if
whitespace <em class="firstterm">Modified</em>, and layout of nodes with significant whitepace
could be corrupted.</p><p>The general best practice, also taking into account the <a class="link" href="https://www.w3.org/TR/its20/#preservespace" target="_top">ITS Preserve Space</a> data
category is the following: </p><p>The <em class="firstterm">Extractor</em> itself ought to</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>Normalize native content where possible, not relying on other
<em class="firstterm">Agents</em> in the roundtrip to do so and</p></li><li class="listitem"><p>protect XLIFF structural elements with <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#xml_space" target="_top">xml:space</a></code> set to <code class="code">preserve</code>. </p></li></ol></div><p>Please note that the XLIFF default for <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#xml_space" target="_top">xml:space</a></code> is <code class="code">default</code>. Therefore it is important to ensure
that content with mixed whitespace behavior is normalized and then protected with
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#xml_space" target="_top">xml:space</a></code> explicitly set to or inherited as <code class="code">preserve</code>. The
default <em class="firstterm">XLIFF Core</em> behavior is only useful if all whitespace is
globally insignificant.</p><p>Additional details in the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#preserve_space" target="_top">XLIFF spec</a>. </p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/xml_space_preserve" target="_top">xml_space_preserve</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="non-localizables"></a>Protecting Non-localizable Content</h4></div></div></div><p>There are cases, when it's necessary to prevent localization of inline content parts
otherwise exposed to localization, be it brand names, or functional inline code, such as
software placeholders.</p><p>XLIFF offers two options for :</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#translateAnnotation" target="_top">translate annotation</a></p></li><li class="listitem"><p><code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code></p></li></ul></div><p>each of them having different purpose and offering different features and
options.</p><p>Careful consideration is necessary to decide which way to protect a particular
non-translatable string, as the two methods are neither equivalent nor
interchangeable.</p><p>Usually, the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#translateAnnotation" target="_top">translate annotation</a> is suitable for protecting linguistically significant
content, e. g. non-localizable brand or product names, while <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code> is better for representing standalone codes without a syntactic
role, typically standalone formatting artifacts such as <code class="code"><br/></code>.</p><div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><table border="0" summary="Note"><tr><td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="file:/C:/Program%20Files/Oxygen%20XML%20Editor%2019/frameworks/docbook/css/img/note.png"></td><th align="left">Note</th></tr><tr><td align="left" valign="top"><p><a class="link" href="#validations" title="XLIFF Validations">XLIFF Core validation artifacts</a> do not support
validating translate annotations by design. It is the ultimate decision of the
<em class="firstterm">Modifier</em> if a span annotated as non-translatable will indeed
stay unchanged based on linguistic and context considerations. If the validation of
the non-translatable annotations is necessary, it needs to be added for instance as
custom validation code or a custom Schematron rule, based on particular business needs
and validation infrastructure options.</p></td></tr></table></div><p>Placeholders and variables to be replaced with syntactically significant content on
runtime are a particularly difficult use case to address. The functional variable usually
doesn't provide much of a context for the translator even in case when not replaced by the
<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code> element and just surrounded by the do-not-translate annotation.
The best way to represent such variables is to use <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code> with the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#type" target="_top"><code class="code">type</code></a> attribute set to <code class="code">ui</code> and the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#subtype" target="_top"><code class="code">subType</code></a> attribute set to <code class="code">xlf:var</code>. See the XLIFF
Constraints for the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#subtype" target="_top"><code class="code">subType</code></a> attribute. In general, the <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ph" target="_top"><ph></a></code> elements need to be accompanied by appropriate <a class="link" href="#Hints" title="Editing and Context Hints">context and editing hints</a>. The <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#disp" target="_top"><code class="code">disp</code></a> is suitable to display (in a CAT tool GUI) an example value
that is likely to be used on runtime. The same or a related value can be used also for the
plain text equivalent (rendering) hint <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#equiv" target="_top"><code class="code">equiv</code></a>. </p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/ph_and_mrk" target="_top">ph_and_mrk</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="merging"></a>Merging Translated Content</h4></div></div></div><p><em class="firstterm">Modifiers</em> can perform various <span class="emphasis"><em>valid</em></span>
transformations during an XLIFF roundtrip. XLIFF compliant <em class="firstterm">Mergers</em>
need to be able correctly handle all of them, as those changes are canonical validity
preserving operations. See in particular the clause 2.e. of the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#Conformance" target="_top">XLIFF Conformance section</a>.</p><p>These operations are (in order of importance or severity of issues caused if ignored):</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>Converting CDATA sections and parseable text (such as XML entities) into XLIFF
<em class="firstterm">inline codes</em>.</p></li><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#segmentationModification" target="_top">Segmentation <em class="firstterm">Modification</em></a>,</p></li><li class="listitem"><p><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#spanningcodeusage" target="_top">Equivalent conversion</a> of <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#pc" target="_top"><pc></a></code> elements into <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#sc" target="_top"><sc></a></code>/<code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#ec" target="_top"><ec></a></code> pairs (always allowed and possible) and <span class="emphasis"><em>vice
versa</em></span> ( only possible with well-formed spanning codes),</p></li><li class="listitem"><p>Adding, and removing of <em class="firstterm">inline codes</em> (taking into account
the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#editinghints" target="_top">editing hints</a> and their Processing Requirements),</p></li><li class="listitem"><p>Content <em class="firstterm">Enrichment</em> with <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/csprd01/xliff-core-v2.1-csprd01.html#annotations" target="_top">Annotations</a>,</p></li><li class="listitem"><p>Performing of any other changes allowed by Processing Requirements.</p></li></ul></div><p><em class="firstterm">Extraction</em> not following best practices usually just shifts the
problems further downstream, forcing other <em class="firstterm">Agents</em> to mitigate the
inherited issues, more often than not leading to unexpected, undesirable, or unpredictable
results that will trip over the <em class="firstterm">Merging</em> after the roundtrip.</p><p>Additional guidance is also available in the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#d0e11123" target="_top">XLIFF Best Practice for <em class="firstterm">Mergers</em></a>.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/merging" target="_top">merging</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="langtags"></a>Selecting Language Tags</h4></div></div></div><p><em class="firstterm">Agents</em> in the roundtrip, machine and human, need to be able to
sufficiently identify the languages used in <em class="firstterm">XLIFF Documents</em>. The
two main languages (the source and the target language) of the XLIFF bitext are primarily
specified by the attributes <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#srcLang" target="_top">srcLang</a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#trgLang" target="_top">trglang</a></code>. The optional <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#xml_lang" target="_top">xml:lang</a></code> attributes on the <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#source" target="_top"><code class="code"><source></code></a> and <a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#target" target="_top"><code class="code"><target></code></a> elements are directly inherited from <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#srcLang" target="_top">srcLang</a></code> and <code class="code"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html#trgLang" target="_top">trglang</a></code> respectively and are in fact provided only for generic XML
Processors interoperability. The attribute <code class="code">itsm:lang</code> serves just
to define inline foreign language spans via annotations if necessary.</p><p>What language tag to use is usually not an issue for languages like Slovak
(<code class="code">sk</code>) or Czech (<code class="code">cs</code>) that are spoken predominantly in one
country and encoded exclusively using one script. This becomes a more prominent question
for languages used in different regions, such as English (<code class="code">en-GB</code>,
<code class="code">en-US</code>); using various scripts, for example Uyghur (<code class="code">ug-Arab</code>,
<code class="code">ug-Cyrl</code>, <code class="code">ug-Latn</code>); or having multiple variants like Basic
English (<span class="emphasis"><em>en-basiceng</em></span>).</p><p>The use cases for the correctly set fain-grained language tags vary from simple, such
as spell-check, which will behave quite differently for <code class="code">en-GB</code>, compared to
<code class="code">en-basiceng</code>; to more complex, like using <code class="code">fr-FR</code> as a reference
language for a <em class="firstterm">Translation</em> into <code class="code">fr-CA</code> .</p><p>MT engines will return Serbian encoded with the Cyrillic script (<code class="code">sr-Cyrl</code>)
output when the request contains the language tag <span class="emphasis"><em>sr</em></span> albeit the user
might have meant and expected Serbian written wit the Latin script (<code class="code">sr-Latn</code>).
Human translators would hopefully ask which of the two was the desired one or just provide
<code class="code">sr-Cyrl</code> as the machines. More dangerous than translators would be
undocumented internal mapping tables and custom business rules facilitating communication
between roundtrip actors that could assume different defaults when not given an
unequivocal language tag, or worse ignore a valid fain-grained language tag they don't
cover.</p><p>The XLIFF Standards prescribe that the <a class="link" href="#bcp47">BCP 47</a> language
tags are to be used as values for attributes specifying human natural languages used in
the <em class="firstterm">XLIFF Documents</em>. The Unicode Consortium offers a <a class="link" href="http://unicode.org/cldr/utility/languageid.jsp" target="_top">tool available
online</a>, which can help to perform basic validation of selected language
tags.</p><p>Generally, language tags need to be carefully chosen for source, target, and reference
languages in <em class="firstterm">XLIFF Documents</em>, and it is worth your time to consult
external resources, or language experts; even more so, if you are not familiar with the
defaults for the language in question.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/language_tags" target="_top">language_tags</a>.</p></div><div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="sanity_checks"></a>Validation of Extracted Content</h4></div></div></div><p>The native format can contain various reserved characters, or their sequences, for
structural and inline markup, as well as for programmatic purposes. While not explicitly
violating XLIFF Constraints and Processing Requirements, their incidence in the
<em class="firstterm">Extracted</em> content could point out issues with the
<em class="firstterm">Extraction</em> process.</p><p>One could implement a <span class="emphasis"><em>sanity check</em></span> for the
<em class="firstterm">Extractor</em> output that would identify potential problems by
looking for such characters, or sequences of them. Failing such a sanity check would
ideally interrupt the roundtrip as early as possible, allow for an update of the
<em class="firstterm">Extraction</em> rules, and for redoing the
<em class="firstterm">Extraction</em> in order to prevent problems further downstream. Not
at least, <em class="firstterm">Extraction</em> would expose such control characters or their
sequences to localization transformations that would most probably be harmful with regards
to the <em class="firstterm">Merge</em> and build processes.</p><p>Example: <a class="link" href="https://github.com/GALAglobal/TAPICC/tree/master/extraction_examples/sanity_check" target="_top">sanity_check</a>.</p></div></div><div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="validations"></a>XLIFF Validations</h3></div></div></div><p>Since XLIFF is a roundtrip oriented format that is supposed to facilitate complex
workflows bringing together best of breed specialized <em class="firstterm">Agents</em>, it
makes huge business sense to validate, to validate a lot, everywhere, and all the time.
Successful validation on every input and output step is the critical factor for successful
"blind", plug-and-play, or unsupervised interoperability. If you are designing a service
architecture facilitated by XLIFF as the canonical data model, or if you are otherwise
integrating many services that are consuming and outputting XLIFF, it makes sense to expose
XLIFF validation as a reusable microservice that is freely available and even mandated at
every input and output step in your ecosystem, service layer, or service bus. When your tool
is supposed to receive <em class="firstterm">XLIFF Documents</em>, check first if the XLIFF you
are trying to consume is indeed valid. Also if you are outputting <em class="firstterm">XLIFF
Documents</em> that other <em class="firstterm">Agents</em> or service providers are
supposed to consume, do check that you are outputting valid XLIFF. Don't force your service
providers to accept invalid XLIFF, rather take their pushback as a signal that something
might be wrong with your process. Do remember that problems pushed downstream will force
various mitigation steps among a potentially large number of Agents and those problems will
resurface unpredictably shape-shifted no later than at the time when you will try to
<em class="firstterm">Merge</em> the XLIFF back, or worse as the localized product's issues,
if <em class="firstterm">Merging</em> and building miraculously succeed.</p><p>XLIFF is a format that has been blessed with multiple low level implementations,
therefore you have also multiple options for XLIFF validation. Since XLIFF Version 2.1, most
of the advanced validation checks that required custom code in XLIFF 2.0 can be validated
using the XLIFF TC provided Schematron schemas that are in fact an integral normative part
of the OASIS Standard. XLIFF TC provided artifacts (xsd, sch, and NVDL) can be used for
validation in any generic XML editor. However, if you are trying to design an automated
workflow you'd not typically rely on manual or semi-automated validation in XML editors, you
ought rather try and build a service that is for instance using the xslt rendering of the
Schematron schemas or create a service out of the command line Lynx tool that is part of the
open source OAKAPI XLIFF Toolkit (java). The open source Microsoft XLIFF Object Model (.NET)
contains built in validation (can be disabled). The fourth option to validate XLIFF is Bryan
Schnabel's Xmarker. Advanced miscroservices architectures would ideally adopt more than one
low level method of validation. This can be used either for redundancy or for
double-checking of validation results, comparing of error messages for advanced
trouble-shooting and so on.</p></div></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="Summary"></a>Summary</h2></div></div></div><p>This specification attempts (among other goals) to make the XLIFF Standards more
accessible to content owners that are not necessarily looking into the full nitty-gritty of
the XLIFF specs. It gives a general guidance how to handle constructs common in HTML and
generic XML, it also provides some basic information on <em class="firstterm">Extraction</em> from
Content Management Systems and software resources. The TAPICC WG3 and the Editors look
forward to receiving feedback how to make this Informational Best Practice even more useful, potentially how
and in what directions to expand its scope.</p><p>We haven't utterly failed if a multilingual publishing data flow designer took home what
are the basic design principles behind XLIFF as the exchange bitext format, both at the
structural and the inline levels. We hopefully managed to introduce and explain good business
reasons for thoughtful, properly structured, and metadata rich XLIFF
<em class="firstterm">Extraction</em> that will in turn facilitate fully automated, gold
standard <em class="firstterm">Merge</em> and target build processes.</p><p>XLIFF <em class="firstterm">Extraction</em> can never make sense when perceived on its own, as
an isolated process. Since XLIFF doesn't purport to standardize <em class="firstterm">Merging</em>
without the full knowledge of the <em class="firstterm">Extraction</em> mechanism, all
implementers that build <em class="firstterm">Extractors</em> will need to build
<em class="firstterm">Mergers</em> in order to benefit from the exercise. In fact, for a
corporate owner the <em class="firstterm">Extractor</em>/<em class="firstterm">Merger</em> will be
considered a single application with two end points. Designers of such tools need to be
acutely aware that every design compromise made at the <em class="firstterm">Extraction</em>
endpoint will compromise the ability of downstream <em class="firstterm">Agents</em> to preform
lossless bitext transformations and thus will in the end undesirably and sometimes
unpredictably affect their own <em class="firstterm">Merging</em> endpoint that will have to
receive the <em class="firstterm">XLIFF Documents</em> after a localization roundtrip. </p><p>Sometimes, some service providers do accept horrible and ugly "XLIFF" pretending that
there is nothing to worry. In such cases, rest assured that the service provider had to
complement the poor <em class="firstterm">Extraction</em>/<em class="firstterm">Merge</em> job with
their own costly pre- and post-processing routines, or worse pushed them even further
downstream onto the translators who may or may not be tech-savvy enough to preserve
unprotected features that are critical for your build or runtime functionality. No matter if
issues resurface in your localized product, you can rest assured that you are paying extra for
solving issues that you could have solved easier on your own or you are not providing the
translators with sufficient metadata to produce the best possible technical and linguistic
in-context quality.</p></div><div class="bibliography"><div class="titlepage"><div><div><h2 class="title"><a name="references"></a>References</h2></div></div></div><div class="bibliodiv"><h3 class="title"><a name="d5e959"></a>Normative references</h3><div class="bibliomixed"><a name="d5e961"></a><p class="bibliomixed">[<abbr class="abbrev">XML</abbr>] W3C: <span class="title">Extensible Markup Language (XML)
1.0</span><span class="date">26 November 2008</span>
<span class="bibliomisc"><a class="link" href="https://www.w3.org/TR/xml/" target="_top">https://www.w3.org/TR/xml/</a></span></p></div><div class="bibliomixed"><a name="d5e967"></a><p class="bibliomixed">[<abbr class="abbrev">XLIFF-2.1</abbr>]
Edited by David Filip, Tom Comerford, Soroush Saadatfar, Felix
Sasaki, and Yves Savourel: <span class="title">XLIFF Version 2.1</span><span class="date">13 February 2018</span>
<span class="bibliomisc"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html" target="_top">http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html</a><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html" target="_top">http://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html</a></span>
</p></div><div class="bibliomixed"><a name="d5e974"></a><p class="bibliomixed">[<abbr class="abbrev">XLIFF-2.0</abbr>]
Edited by Tom Comerford, David Filip, Rodolfo M. Raya, and Yves
Savourel: <span class="title">XLIFF Version 2.0</span><span class="date">04 August 2014</span>
<span class="bibliomisc"><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.0/os/xliff-core-v2.0-os.html" target="_top">http://docs.oasis-open.org/xliff/xliff-core/v2.0/os/xliff-core-v2.0-os.html</a><a class="link" href="http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html" target="_top">http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html</a></span>
</p></div><div class="bibliomixed"><a name="d5e981"></a><p class="bibliomixed">[<abbr class="abbrev">ISO XLIFF</abbr>]
Edited by Tom Comerford, David Filip, Rodolfo M. Raya, and Yves
Savourel: <span class="title">ISO 21720:2017 - XLIFF (XML Localisation interchange file
format)</span><span class="date">November 2017</span>
<span class="bibliomisc"><a class="link" href="https://www.iso.org/standard/71490.html" target="_top">https://www.iso.org/standard/71490.html</a></span>
</p></div><div class="bibliomixed"><a name="bcp47"></a><p class="bibliomixed">[bcp47] <span class="bold"><strong>BCP 47</strong></span> M. Davis, <span class="italic">Tags
for Identifying Languages</span>, <span class="italic"><a class="link" href="http://tools.ietf.org/html/bcp47" target="_top">http://tools.ietf.org/html/bcp47</a></span> IETF (Internet Engineering Task
Force).</p></div></div></div></div></body></html>