Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf variant does not successfully include ⛄ (U+26C4 SNOWMAN WITHOUT SNOW) #6050

Open
1 task done
dkg opened this issue Jul 26, 2023 · 7 comments
Open
1 task done
Labels
bug Something isn't working under_review

Comments

@dkg
Copy link

dkg commented Jul 26, 2023

Describe the issue

draft-dkg-rfcediting-non-ascii-ietf-tooling is a test draft that contains multiple non-ascii characters. they all render just fine in the text and html variants, but the pdf variant fails to include ⛄ (U+26C4 SNOWMAN WITHOUT SNOW). it renders ☃ (U+2603 SNOWMAN) with no problem, though. Maybe this has something to do with codepoint coverage of the default fonts.

Code of Conduct

@dkg dkg added the bug Something isn't working label Jul 26, 2023
@cabo
Copy link
Collaborator

cabo commented Jul 26, 2023

Font issue, I'd say

$ pbpaste | echars
*** Miscellaneous Symbols (Common)
☃: U+2603 1 SNOWMAN
⛄: U+26C4 1 SNOWMAN WITHOUT SNOW

They are in the same group in Unicode, but of course fonts don't pick up whole groups.
(And my browser is broken and shows both the same.)

@rjsparks
Copy link
Member

Is this an issue with the pdf that comes out of xml2rfc, or the pdfized pdf-rendering-of-the-htmilzed-text that comes out of the datatracker? If the former, lets move this to the xml2rfc repo?

@kesara
Copy link
Member

kesara commented Aug 11, 2023

PDF generated by xml2rfc shows both snowmen. But from two different font groups.
Probably need to include extra font on xml2rfc but I think we can tackle that if this gets to RFC-to-be stage.

draft-dkg-rfcediting-non-ascii-ietf-tooling-01.pdf

@dkg
Copy link
Author

dkg commented Aug 14, 2023

in case it wasn't clear, i don't intend draft-dkg-rfcediting-non-ascii-ietf-tooling to ever become an RFC! that's just a test harness so i can push back on some of the FUD i was hearing about how non-ASCII text might be broken.

I'm unaware of any RFC use case that would need either SNOWMAN character, but the demonstration is intended to highlight problems and identify structural issues in unicode coverage and transmission before some RFC really does try to use a symbol that isn't well-supported in one of the output formats.

The problem pdf i found was generated by the datatracker -- i don't know what toolchain was used. When generating the file locally with xml2rfc i do actually see both glyphs. It's possible that this is due to my having certain fonts available locally that are not available on the VM hosting the datatracker, but i don't know.

thanks for looking into it, i really appreciate all the work that has been done on making the RFC series capable of including robust, modern documents with a stable and expansive character set.

@rjsparks
Copy link
Member

Thanks @dkg - I understand what you're doing - and what you provide above is enough for me to know which invocation of weasyprint to study. It's the one in the xml2fc environment used by the datatracker when it generates formats from xml submissions, which may well not have the right font set installed - we'll go look.

@dkg
Copy link
Author

dkg commented Aug 14, 2023

(for the record, this I-D has been much more useful than just identifying the SNOWMAN weirdness -- it demonstrated that use cases i heard active concerns about during IETF 117 (cyrillic text, mathematical symbols) do work fine. what you see in my reports are the corner cases where things remain broken -- but the real takeaway from this for me is that the use cases people actually care about are not broken. thanks for all the work that has gone into this!)

@larseggert
Copy link
Collaborator

larseggert commented Aug 17, 2023

In the web view, on my machine, the "snowman without snow" comes from the "Apple Color Emoji" font, and the "snowman with snow" comes from the "Menlo" font.

I guess that's because the CSS says font-family: "Noto Sans Mono", SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace. (But I don't understand where "Apple Color Emoji" comes from...)

That same CSS is passed into Weasyprint when making the PDF, and these are the fonts that end up in the PDF:

name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
IVBPZU+Noto-Sans-Mono                CID TrueType      Identity-H       yes yes yes     64  0
JUCHNC+Noto-Sans-Mono-Bold           CID TrueType      Identity-H       yes yes yes     68  0
TNCRMY+DejaVu-Sans-Mono              CID TrueType      Identity-H       yes yes yes     72  0
IZYCUH+DejaVu-Sans                   CID TrueType      Identity-H       yes yes yes     76  0

Not sure where/why "DejaVu" is picked up from, but I guess it doesn't have the character.

Since we want to use Noto, should we add https://fonts.google.com/noto/specimen/Noto+Emoji?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working under_review
Projects
None yet
Development

No branches or pull requests

5 participants