Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better <text> elements: revisited #284

Closed
deining opened this issue Feb 9, 2025 · 6 comments
Closed

Better <text> elements: revisited #284

deining opened this issue Feb 9, 2025 · 6 comments
Assignees

Comments

@deining
Copy link
Contributor

deining commented Feb 9, 2025

I'm struggling with unwanted <tspan> elements inside in my svg files produced by dvisvgm 3.4.3. By reading the discussion in the context of #56 I got aware that a) others are affected by this deficiency, too and b) that this issue might be difficult to address. However, there may be still some room for improvement here, so let me tell my findings:

Minimal working example

\documentclass[tikz]{standalone}
\begin{document}
\begin{tikzpicture}
    \draw (0,0) node {Dosiergriff};
\end{tikzpicture}
\end{document}

After running dvilualatex mwe.tex && dvisvgm mwe.dvi, this SVG is produced:

<?xml version='1.0' encoding='UTF-8'?>
<!-- This file was generated by dvisvgm 3.4.3 -->
<svg version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' width='45.080946pt' height='9.075965pt' viewBox='-68.679806 -68.679465 45.080946 9.075965'>
<defs>
<font id='nf0' horiz-adv-x='549'>
<font-face font-family='nf0' units-per-em='1000' ascent='1127' descent='290'/>
<glyph unicode='D' horiz-adv-x='764' vert-adv-y='764' glyph-name='D' d='M707 336C707 526 572 683 401 683H35V652H59C136 652 138 641 138 605V78C138 42 136 31 59 31H35V0H401C569 0 707 148 707 336ZM607 336C607 225 588 165 552 116C532 89 475 31 374 31H273C226 31 224 38 224 71V612C224 645 226 652 273 652H373C435 652 504 630 555 559C598 500 607 414 607 336Z'/>
<glyph unicode='e' horiz-adv-x='444' vert-adv-y='444' glyph-name='e' d='M415 119C415 129 407 131 402 131C393 131 391 125 389 117C354 14 264 14 254 14C204 14 164 44 141 81C111 129 111 195 111 231H390C412 231 415 231 415 252C415 351 361 448 236 448C120 448 28 345 28 220C28 86 133-11 248-11C370-11 415 100 415 119ZM349 252H112C118 401 202 426 236 426C339 426 349 291 349 252Z'/>
<glyph unicode='g' horiz-adv-x='500' vert-adv-y='500' glyph-name='g' d='M485 404C485 421 473 453 434 453C414 453 370 447 328 406C286 439 244 442 222 442C129 442 60 373 60 296C60 252 82 214 107 193C94 178 76 145 76 110C76 79 89 41 120 21C60 4 28-39 28-79C28-151 127-206 249-206C367-206 471-155 471-77C471-42 457 9 406 37C353 65 295 65 234 65C209 65 166 65 159 66C127 70 106 101 106 133C106 137 106 160 123 180C162 152 203 149 222 149C315 149 384 218 384 295C384 332 368 369 343 392C379 426 415 431 433 431C433 431 440 431 443 430C432 426 427 415 427 403C427 386 440 374 456 374C466 374 485 381 485 404ZM309 296C309 269 308 237 293 212C285 200 262 172 222 172C135 172 135 272 135 295C135 322 136 354 151 379C159 391 182 419 222 419C309 419 309 319 309 296ZM419-79C419-133 348-183 250-183C149-183 80-132 80-79C80-33 118 4 162 7H221C307 7 419 7 419-79Z'/>
<glyph unicode='i' horiz-adv-x='278' vert-adv-y='278' glyph-name='i' d='M247 0V31C181 31 177 36 177 75V442L37 431V400C102 400 111 394 111 345V76C111 31 100 31 33 31V0L143 3C178 3 213 1 247 0ZM192 604C192 631 169 657 139 657C105 657 85 629 85 604C85 577 108 551 138 551C172 551 192 579 192 604Z'/>
<glyph unicode='o' horiz-adv-x='500' vert-adv-y='500' glyph-name='o' d='M471 214C471 342 371 448 250 448C125 448 28 339 28 214C28 85 132-11 249-11C370-11 471 87 471 214ZM388 222C388 186 388 132 366 88C344 43 300 14 250 14C207 14 163 35 136 81C111 125 111 186 111 222C111 261 111 315 135 359C162 405 209 426 249 426C293 426 336 404 362 361S388 260 388 222Z'/>
<glyph unicode='r' horiz-adv-x='392' vert-adv-y='392' glyph-name='r' d='M364 381C364 413 333 442 290 442C217 442 181 375 167 332V442L28 431V400C98 400 106 393 106 344V76C106 31 95 31 28 31V0L142 3C182 3 229 3 269 0V31H248C174 31 172 42 172 78V232C172 331 214 420 290 420C297 420 299 420 301 419C298 418 278 406 278 380C278 352 299 337 321 337C339 337 364 349 364 381Z'/>
<glyph unicode='s' horiz-adv-x='394' vert-adv-y='394' glyph-name='s' d='M360 128C360 181 330 211 318 223C285 255 246 263 204 271C148 282 81 295 81 353C81 388 107 429 193 429C303 429 308 339 310 308C311 299 322 299 322 299C335 299 335 304 335 323V424C335 441 335 448 324 448C319 448 317 448 304 436C301 432 291 423 287 420C249 448 208 448 193 448C71 448 33 381 33 325C33 290 49 262 76 240C108 214 136 208 208 194C230 190 312 174 312 102C312 51 277 11 199 11C115 11 79 68 60 153C57 166 56 170 46 170C33 170 33 163 33 145V13C33-4 33-11 44-11C49-11 50-10 69 9C71 11 71 13 89 32C133-10 178-11 199-11C314-11 360 56 360 128Z'/>
<glyph unicode='ff' horiz-adv-x='583' vert-adv-y='583' glyph-name='f_f' d='M628 635C628 671 593 705 537 705C478 705 434 667 429 663C400 700 344 705 317 705C222 705 106 653 106 545V431H27V400H106V76C106 31 95 31 28 31V0L139 3L250 0V31C183 31 172 31 172 76V400H382V76C382 31 371 31 304 31V0L418 3C458 3 505 3 545 0V31H524C450 31 448 42 448 78V400H563V431H445V547C445 636 493 683 538 683C541 683 556 683 571 676C559 672 541 659 541 634C541 611 557 591 584 591C613 591 628 611 628 635ZM393 664C377 661 357 648 357 620C357 614 358 589 385 580C382 567 382 558 382 546V431H169V544C169 641 251 683 316 683C365 683 393 664 393 664Z'/>
</font>
</defs>
<style type='text/css'>
<![CDATA[text.f0 {font-family:nf0;font-size:9.96264px}
]]>
</style>
<g id='page1'>
<text class='f0' x='-46.139006' y='-64.141504' transform='matrix(1 0 0 1 -22.5408 2.4857)'>Dosiergriff</text>
</g>
</svg>

There are no <tspan> elements enclosed inside the <text> element, this is perfect.

Minimal working example 2

Here I added instructions to use font NotoSans (line 2 + 3), the rest of the example remained unchanged:

\documentclass[tikz]{standalone}
\usepackage{fontspec}
\setmainfont{NotoSans}
\begin{document}
\begin{tikzpicture}
    \draw (0,0) node {Dosiergriff};
\end{tikzpicture}
\end{document}

Again, I'm running dvilualatex mwe.tex && dvisvgm mwe.dvi to produce a SVG file. The structure of this file is very similar to the SVG file printed above, but this time, there is an unwanted <tspan> element enclosed inside the <text> element:

<g id='page1'>
<text class='f0' x='-43.708119' y='-63.673258' transform='matrix(1 0 0 1 -24.9717 2.6152)'>Dosier<tspan x='-13.431662'>griff</tspan></text>
</g>

My question now is: why do we have this <tspan> element in the second example and is there any way to avoid it?

Note: I'm aware that with both produced SVG files, the text is displayed correctly. This is different from my experiences with real life situations, where the blocks formed by <tspan> elements do not align properly, so that the readability of the text is impaired. Manually removing the <tspan> elements cures the problem, but having to do this with many files isn't appealing at all, that's why I'm asking here. Thanks for your investigations/clarification!

@mgieseki
Copy link
Owner

mgieseki commented Feb 9, 2025

The span elements are usually added when dvisvgm processes a DVI command that changes the output position, e.g. because of kerning and other typographic tasks. Without the modified positions the result might still look fine but the typographic details added by TeX get removed. Here's your second example without and with tspan element. As you can see, TeX slightly reduces the distance between "r" and "g".

Image

As these modifications depend on the used fonts, you get different results when changing the font.

This is different from my experiences with real life situations, where the blocks formed by <tspan> elements do not align properly, so that the readability of the text is impaired. Manually removing the <tspan> elements cures the problem, but having to do this with many files isn't appealing at all, that's why I'm asking here.

Could you elaborate on this? The tspan elements are required to create the correct character positions as encoded in the DVI file. Without them the results are incorrect. If your SVG files look wrong, there a probably other reasons for it. I'd like to see such an example if possible.

@mgieseki mgieseki self-assigned this Feb 9, 2025
@deining
Copy link
Contributor Author

deining commented Feb 10, 2025

Thanks for your quick response.

Could you elaborate on this?

Yes, of course.

Minimal working example:

\documentclass[tikz]{standalone}
\usepackage{fontspec}
\setmainfont{Exo2}
\begin{document}
\begin{tikzpicture}
  \draw (0,0) node {Éléments de chauffage};
\end{tikzpicture}
\end{document}

After running dvilualatex mwe.tex && dvisvgm mwe.dvi, this SVG is produced:

<?xml version='1.0' encoding='UTF-8'?>
<!-- This file was generated by dvisvgm 3.4.3 -->
<svg version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink' width='103.342464pt' height='11.317559pt' viewBox='-68.680156 -68.679413 103.342464 11.317559'>
<defs>
<font id='nf0' horiz-adv-x='559'>
<font-face font-family='nf0' units-per-em='1000' ascent='999' descent='201'/>
<glyph unicode='É' horiz-adv-x='560' vert-adv-y='560' glyph-name='Eacute' d='M214 691Q264 691 314 691Q365 691 414 689Q464 688 510 685L506 616H229Q198 616 183 600Q169 585 169 550V140Q169 105 183 89Q198 73 229 73H506L510 5Q464 2 414 1Q365 0 314 0Q264-1 214-1Q153-1 117 31Q82 64 81 119V571Q82 627 117 659Q153 691 214 691ZM99 400H467V329H99V400ZM411 934L460 868L268 748L239 787L411 934Z'/>
<glyph unicode='a' horiz-adv-x='549' vert-adv-y='549' glyph-name='a' d='M308 499Q362 499 399 484Q437 470 456 436T475 343V0H408L395 106L389 117V343Q389 388 368 407Q348 427 290 427Q252 427 194 423Q137 419 81 414L73 477Q107 483 147 488Q188 493 230 496T308 499ZM435 300L434 236L201 235Q166 234 152 216Q138 199 138 168V137Q138 99 156 81Q174 64 214 64Q242 64 277 74Q313 85 349 107T415 163V100Q404 86 383 67Q363 49 334 31Q306 14 271 2Q237-9 197-9Q154-9 121 6Q88 22 69 52Q51 83 51 127V180Q51 237 86 268Q122 300 186 300H435Z'/>
<glyph unicode='c' horiz-adv-x='489' vert-adv-y='489' glyph-name='c' d='M286 499Q308 499 335 497T390 490Q419 485 443 476L432 421Q399 424 362 425Q326 427 300 427Q242 427 208 411T159 355T144 244T159 132Q174 92 208 76T300 60Q315 60 339 61T390 64Q418 66 443 70L452 13Q415 0 371-6Q328-12 285-12Q201-12 150 13T76 94Q54 150 54 244T77 394T151 474Q202 499 286 499Z'/>
<glyph unicode='d' horiz-adv-x='573' vert-adv-y='573' glyph-name='d' d='M240 499Q288 499 335 486Q382 474 425 448L420 395Q373 409 337 417T260 425Q218 425 192 410Q167 396 155 357T143 246T154 134Q166 94 191 78T255 62Q284 62 307 69Q331 77 357 94Q384 111 420 137L428 77Q388 40 337 14T227-12Q135-12 95 53Q56 118 56 245Q56 340 76 395Q96 451 137 475T240 499ZM494 700V0H428L418 85L409 92V426L415 441Q411 469 410 495T409 550V700H494Z'/>
<glyph unicode='e' horiz-adv-x='537' vert-adv-y='537' glyph-name='e' d='M278 499Q388 499 437 459Q487 419 487 340Q488 275 460 238Q432 202 371 202H87V268H349Q382 268 392 290Q402 313 402 340Q401 387 374 407T282 427Q229 427 198 411Q168 396 155 357Q143 319 143 250Q143 172 158 131Q174 90 208 75T298 60Q337 60 383 63Q430 67 467 72L476 15Q453 6 419 0Q386-6 350-9Q315-12 287-12Q203-12 152 13Q101 39 77 95Q54 152 54 245Q54 341 77 396Q101 452 150 475Q200 499 278 499Z'/>
<glyph unicode='é' horiz-adv-x='537' vert-adv-y='537' glyph-name='eacute' d='M278 499Q388 499 437 459Q487 419 487 340Q488 275 460 238Q432 202 371 202H87V268H349Q382 268 392 290Q402 313 402 340Q401 387 374 407T282 427Q229 427 198 411Q168 396 155 357Q143 319 143 250Q143 172 158 131Q174 90 208 75T298 60Q337 60 383 63Q430 67 467 72L476 15Q453 6 419 0Q386-6 350-9Q315-12 287-12Q203-12 152 13Q101 39 77 95Q54 152 54 245Q54 341 77 396Q101 452 150 475Q200 499 278 499ZM382 741L431 675L239 555L210 594L382 741Z'/>
<glyph unicode='g' horiz-adv-x='558' vert-adv-y='558' glyph-name='g' d='M271 499Q349 499 394 483Q440 467 459 433Q479 399 479 344Q479 291 459 256Q440 222 394 205Q348 189 270 189T146 205Q101 222 81 256Q62 290 62 343Q62 398 81 432Q101 467 147 483T271 499ZM270 434Q197 434 168 413Q140 393 140 344Q140 296 168 275Q197 254 270 254Q344 254 372 275T400 344Q400 393 372 413Q344 434 270 434ZM536 487L531 440L407 430L372 487H536ZM133 221L182 214Q165 199 159 177Q153 156 162 138Q172 120 203 115L385 86Q454 76 481 43T508-51Q508-108 486-141Q465-174 414-188T276-202Q210-202 165-194Q120-187 92-169Q65-152 53-124T41-54Q41-22 50 0Q60 21 80 38Q101 55 134 71L194 104L233 90L184 53Q164 38 150 23Q137 9 130-7Q124-24 124-47Q124-81 137-99T183-123Q216-130 275-130T366-123Q399-116 412-98T425-47Q425-22 418-9Q412 4 394 11T341 23L166 47Q135 51 116 67Q98 83 91 105T87 149Q90 172 102 191Q114 211 133 221Z'/>
<glyph unicode='h' horiz-adv-x='581' vert-adv-y='581' glyph-name='h' d='M374 499Q506 499 506 365V0H420V341Q420 390 404 408Q389 427 351 427Q306 427 261 406T154 345L150 406Q207 449 263 474T374 499ZM163 700L164 507Q164 474 161 445Q159 416 154 394L163 379V0H78V700H163Z'/>
<glyph unicode='l' horiz-adv-x='297' vert-adv-y='297' glyph-name='l' d='M163 700V134Q163 102 180 85Q198 69 230 69H277L287 4Q278 0 261-3Q245-6 227-7Q210-9 198-9Q144-9 111 23Q79 55 79 115V700H163Z'/>
<glyph unicode='m' horiz-adv-x='876' vert-adv-y='876' glyph-name='m' d='M671 499Q734 499 767 465Q801 431 801 365V0H717V341Q716 386 699 406Q682 427 641 427Q615 427 591 418Q568 410 540 392T472 345L467 405Q520 452 570 475Q620 499 671 499ZM148 487L154 394L163 379V0H78V487H148ZM352 499Q414 499 447 465Q480 432 481 365V0H399V341Q399 388 380 407Q361 427 323 427Q298 427 274 419Q251 411 222 393Q194 375 153 345L147 405Q200 452 250 475Q300 499 352 499Z'/>
<glyph unicode='n' horiz-adv-x='581' vert-adv-y='581' glyph-name='n' d='M374 499Q506 499 506 365V0H421V341Q421 390 405 408Q389 427 351 427Q306 427 261 405T154 345L150 406Q207 450 263 474Q319 499 374 499ZM150 487L157 394L163 379V0H78V487H150Z'/>
<glyph unicode='s' horiz-adv-x='508' vert-adv-y='508' glyph-name='s' d='M240 499Q272 499 308 497T379 492Q415 489 445 484L438 422Q392 424 345 425Q299 427 253 427Q208 428 182 425T144 409Q133 396 133 365Q133 327 151 315T204 295L346 263Q404 249 431 220Q459 191 459 129Q459 69 436 39Q414 9 368-1T251-11Q225-11 173-9Q122-7 60 3L66 65Q90 64 117 62Q144 61 173 61Q203 61 234 61Q289 61 320 66T364 85Q377 100 377 129Q377 165 356 177T300 197L160 229Q120 239 96 255Q72 272 61 298Q50 325 50 365Q50 422 70 451T132 490T240 499Z'/>
<glyph unicode='t' horiz-adv-x='381' vert-adv-y='381' glyph-name='t' d='M200 631V134Q200 99 215 84T266 69H340L350 4Q334 0 313-3T272-7Q252-9 240-9Q179-9 147 25T115 123V631H200ZM355 487V420H27V482L123 487H355Z'/>
<glyph unicode='u' horiz-adv-x='570' vert-adv-y='570' glyph-name='u' d='M160 487V145Q159 99 177 80T237 61Q278 61 318 80Q359 100 416 137L427 78Q370 33 316 10Q263-13 208-13Q75-13 75 121V487H160ZM492 487V0H424L416 92L407 107V487H492Z'/>
<glyph unicode='ẘ' horiz-adv-x='711' vert-adv-y='711' glyph-name='f_f' d='M258 704Q273 704 295 703Q317 703 340 701Q363 700 380 697L374 634H288Q240 634 220 617Q201 601 201 552V0H116V562Q116 607 131 639Q147 671 178 687Q210 704 258 704ZM588 717Q604 717 629 716Q654 716 680 714Q706 713 725 710L719 647H618Q570 647 551 628T532 565V0H446V574Q446 620 460 651Q474 683 505 700T588 717ZM687 487V420H28V482L121 487H687Z'/>
</font>
</defs>
<style type='text/css'>
<![CDATA[text.f0 {font-family:nf0;font-size:9.96264px}
]]>
</style>
<g id='page1'>
<text class='f0' x='-17.008256' y='-63.020707' transform='matrix(1 0 0 1 -51.6719 3.6464)'>Éléments<tspan x='27.82362'>de</tspan><tspan x='41.103815'>chauẘag</tspan><tspan x='80.98427'>e</tspan></text>
</g>
</svg>

Previewer:

Image

The ligature is causing problems here. If I remove the line \setmainfont{Exo2}, the produced SVG file looks fine. So the issue is somehow related to Exo2 ttf-font I'm using. Your help is appreciated, thanks a lot for investigating!

Note: To get a useable SVG file, I'm currently writing {Éléments de chauf\mbox{}fage}, but that's more a hack than a good solution.

@mgieseki
Copy link
Owner

Thank you for the additional info. What SVG viewer do you use? It probably doesn't support SVG fonts which dvisvgm embeds by default. At least the font shown in your previewer image doesn't seem to be Exo2. Maybe you need to call dvisvgm with option --font-format=woff or --no-fonts. I get this result:

Image

@deining
Copy link
Contributor Author

deining commented Feb 11, 2025

What SVG viewer do you use?

I used Inkscape for preview.

It probably doesn't support SVG fonts

Bingo.

Maybe you need to call dvisvgm with option --font-format=woff or --no-fonts.

Once I make use of the option --no-fonts, the text is displayed properly both in browsers and in Inkscape, so I'm fine now. Thanks for your help!

One issue remains. Have a look at this part of the SVG produced with my MWE (Exo2 activated):

<defs>
<font id='nf0' horiz-adv-x='559'>
<font-face font-family='nf0' units-per-em='1000' ascent='999' descent='201'/>
<glph ... />
many more glyph elements
<glyph unicode='ẘ' horiz-adv-x='711' vert-adv-y='711' glyph-name='f_f' d='M258 704Q273 704 295 703Q317 703 340 701Q363 700 380 697L374 634H288Q240 634 220 617Q201 601 201 552V0H116V562Q116 607 131 639Q147 671 178 687Q210 704 258 704ZM588 717Q604 717 629 716Q654 716 680 714Q706 713 725 710L719 647H618Q570 647 551 628T532 565V0H446V574Q446 620 460 651Q474 683 505 700T588 717ZM687 487V420H28V482L121 487H687Z'/>
</font>
</defs>

The line containing <glyph unicode= .../> is surprising to me:

<glyph unicode='ẘ' horiz-adv-x='...' vert-adv-y='...' glyph-name='f_f'  ... />

Shouldn't that be:

<glyph unicode='ff' horiz-adv-x='...' vert-adv-y='...' glyph-name='f_f' ... />

I think this improper!? use of the unicode attribute is the root cause for the display of the unwanted glyph inside the text string displayed from Inkscape:

<text ...>Éléments<tspan ... >de</tspan><tspan ...>chauẘag</tspan><tspan ...>e</tspan></text>

Any idea why the glyph shows up here? Thanks for investigating!

@mgieseki
Copy link
Owner

Once I make use of the option --no-fonts, the text is displayed properly both in browsers and in Inkscape, so I'm fine now. Thanks for your help!

Great, I'm glad to hear that you could fix the issue.

The line containing <glyph unicode= .../> is surprising to me:

<glyph unicode='ẘ' horiz-adv-x='...' vert-adv-y='...' glyph-name='f_f'  ... />

Shouldn't that be:

<glyph unicode='ff' horiz-adv-x='...' vert-adv-y='...' glyph-name='f_f' ... />

I think this improper!? use of the unicode attribute is the root cause for the display of the unwanted glyph inside the text string displayed from Inkscape:

<text ...>Éléments<tspan ... >de</tspan><tspan ...>chauẘag</tspan><tspan ...>e</tspan></text>

Any idea why the glyph shows up here? Thanks for investigating!

Yes, that's an issue with the font file here because its Unicode character map doesn't cover all glyphs present in the font. For example, the ff ligature (glyph ID 474 in Exo 2) has a name assigned (f_f) but no code point. Therefore, dvisvgm doesn't know what Unicode character the glyph should get and just picks a random one not used by the font. The shape of the glyph is unambiguous though and should always look correct when rendering the SVG file.

I'll have a look if I can add a lookup table for common glyph names, like f_f, that can be used as a fallback method for glyphs without an assigned code point.

@deining
Copy link
Contributor Author

deining commented Feb 11, 2025

Yes, that's an issue with the font file here because its Unicode character map doesn't cover all glyphs present in the font. For example, the ff ligature (glyph ID 474 in Exo 2) has a name assigned (f_f) but no code point. Therefore, dvisvgm doesn't know what Unicode character the glyph should get and just picks a random one not used by the font.

Thanks for your explanation!

The shape of the glyph is unambiguous though and should always look correct when rendering the SVG file.

I can confirm that it looks correct 😄.

I'll have a look if I can add a lookup table for common glyph names, like f_f, that can be used as a fallback method for glyphs without an assigned code point.

Great idea, that would make dvisvgm converter even better!

Closing this issue, thanks or your help!

@deining deining closed this as completed Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants