Skip to content

Subset fonts to used character data#10655

Merged
aduth merged 17 commits intomainfrom
aduth-subset-content
May 22, 2024
Merged

Subset fonts to used character data#10655
aduth merged 17 commits intomainfrom
aduth-subset-content

Conversation

@aduth
Copy link
Contributor

@aduth aduth commented May 20, 2024

🛠 Summary of changes

Optimizes fonts to remove unused character data.

This is similar to #6094, but more-aggressively optimizes font files. With #6094, we used predefined Latin character set to reduce the size of fonts. With this approach, we specifically target only the character glyphs that exist in localized string data.

This also removes unused .woff files. As of USWDS v3.4.0, only .woff2 files are used.

For future consideration: We could optimize this even more aggressively if we created per-locale font subsets, to avoid loading French and Spanish character data for English users (and vice-versa, as applicable).

Why?

After application.css, these fonts are the second-largest first-party asset on https://secure.login.gov , each coming in around 21.5kb (vs. 22.5kb for application.css), for a combined total of 42.9kb font data on the homepage.

Performance Impact

The numbers below reflect the combined total of PublicSans-Regular.woff2 and PublicSans-Bold.woff2, the most common font combination loaded on each page (and preloaded by default).

Before: 42.9kb (21.5kb + 21.4kb)
After: 31kb (15.5kb + 15.5kb)
Diff: -11.9kb (-27.7%)

📜 Testing Plan

Verify that text is still rendered as Public Sans:

  1. Go to http://localhost:3000 or any page of the application
  2. Inspect some text using your browser's developer tools
  3. Observe that the rendered font is using the Public Sans network resource

This was added when considering support for legacy formats with nested hashes. Since we now target only flattened keys (where values are either strings or arrays of strings), use `flatten` instead to simplify
@@ -0,0 +1 @@
!#%&(),-./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz «»¿ÀÁÈÉÊÎÓÚàáâãçèéêëíîïñóôùúû ‑—‘’“”…‹中体文简
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we could do this dynamically at deploy time? This seems like an annoying thing to fail the build on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's burdensome, yes. But realistically I also think this is something that will almost never happen.

@aduth
Copy link
Contributor Author

aduth commented May 20, 2024

Chatting with @mitchellhenke offline about this, one consideration is if we could account for locale data from Rails and other third-party dependencies.

Some options to consider:

  • Since additional characters are more likely in non-English locales, subset specifically for English only, and continue to use predefined set for Spanish & French
  • (If possible) Initialize i18n backend and load all data, then dump string values from in-memory backend
  • Scrape content from all files loaded via I18n.load_path
  • Manually verify but dismiss as non-concern if these other sources are unlikely to include additional characters

@aduth
Copy link
Contributor Author

aduth commented May 20, 2024

Prompted by #10655 (comment), I was curious what characters aren't included anymore, to get a sense of how likely it'd be for one of them to be reintroduced in the future.

These are the characters:

"$\'*+<=>[\\]^`{|}~¡¢£¤¥¦§¨©ª¬­®¯°±²³´µ¶·¸¹º¼½¾ÂÃÄÅÆÇËÌÍÏÐÑÒÔÕÖרÙÛÜÝÞßäåæìðòõö÷øüýþÿ

Edit: After 1f5f81c, this is reduced further to just these characters:

*+<=[\\]^`{|}¡¢£¤¥¦§¨©ª¬­®¯°±²³´µ¶·¸¹º¼½¾ÂÃÄÅÆÇËÌÍÏÐÑÒÔÕÖרÙÛÜÝÞßäåæìðòõö÷øüýþÿ

One interesting observation I made based off this is that our current fonts likely don't include the smart-quotes that we enforce in our content, since they're not part of the Latin character set. So these changes may improve the appearance of content which includes those.

aduth added 2 commits May 20, 2024 13:12
changelog: Internal, Performance, Optimize size of fonts to include only content character data
@aduth aduth marked this pull request as draft May 20, 2024 17:14
docs/frontend.md Outdated
1. [Download Public Sans](https://public-sans.digital.gov/) and extract it to your project's `tmp/` directory
2. Install [glyphhanger](https://github.com/zachleat/glyphhanger) and its dependencies:
1. `npm install -g glyphhanger`
2. `pip install fonttools brotli zopfli`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a script that wraps this with something like a virtualenv setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my last comment, at this stage I'm not sure it's worth investing too much into the apparatus around updating the fonts, assuming that this should be an exceedingly rare event. If it happens more often than I'm expecting (like more often than once every year or two), then I could see that as a future enhancement worth considering.

@aduth aduth marked this pull request as ready for review May 20, 2024 18:08
Comment on lines +37 to +39
I18n.backend.eager_load!

data = I18n.backend.translations.slice(*I18n.available_locales - excluded_locales)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should do something similar to this in spec/i18n_spec.rb. Interestingly i18n-tasks doesn't load all string data eagerly like this, so we miss some strings in those specs. Having a more complete set of strings to test against could help prevent issues like one we encountered a few weeks back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me to add that there

Copy link
Contributor

@zachmargolis zachmargolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good and I can't think of anything else we've missed, so it LGTM

Overall, I am not convinced if we need this PR?

  • Pro: I totally understand the reduction in filesize is great for first-time loads, but I'd expect that fonts would get cached and so subsequent page loads would not be fresh.
  • Con: My gut is that this process is on the border of "is juice worth the squeeze" because if we ever do introduce a new character, it appears a fairly annoying manual process to fix.
  • Con: If a character slips through, I'm not sure that we have any automated feedback mechanisms to help us know (since browsers would just fall back to a different font that has the glyphs, right?)

Comment on lines +37 to +39
I18n.backend.eager_load!

data = I18n.backend.translations.slice(*I18n.available_locales - excluded_locales)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me to add that there

@aduth
Copy link
Contributor Author

aduth commented May 20, 2024

@zachmargolis Those are all totally valid considerations, and I do think we're reaching the point of diminishing returns on some of the low-hanging fruit with frontend performance. But that's also part of the reason for some of my pushback in comments, since the juice-vs-squeeze here should require this to be minimal investment for it to be worthwhile.

  • Pro: I totally understand the reduction in filesize is great for first-time loads, but I'd expect that fonts would get cached and so subsequent page loads would not be fresh.

True, but the same argument can (and I assume is) used to justify most apps' resource bloat, and when loading does need to occur, font loading is one of the most visually-noticeable changes and contributes to largest contentful paint metrics.

This is also seeking to optimize assuming assets are loaded in parallel, where the page load is as long as its largest assets, and fonts are currently neck-and-neck with the main application stylesheet in competing for largest page asset.

  • Con: My gut is that this process is on the border of "is juice worth the squeeze" because if we ever do introduce a new character, it appears a fairly annoying manual process to fix.

True, though besides what I'd already commented about not expecting this to be common, I don't think it's that tedious a process? I also think it's different from something like vulnerable dependency updates failing builds on unrelated pull requests, since at least in this case the build should only fail as a direct result of the changes of the pull request, assuming someone is adding content including a character we don't already include in our fonts.

  • Con: If a character slips through, I'm not sure that we have any automated feedback mechanisms to help us know (since browsers would just fall back to a different font that has the glyphs, right?)

I think this is actually improved by this pull request. We were already subsetting fonts as of #6094, and as mentioned in #10655 (comment) we already had characters that had slipped through and weren't being included in the subset fonts. At least now we have some test coverage to compare our content with what's included in the fonts.

And I think the other point about fallbacks is a reassurance that even in the worst-case scenario of the character being unavailable, the system will still render a fallback from the system fonts. So the overall risk is relatively low.

aduth and others added 2 commits May 21, 2024 09:41
See: 675e0de#r1608353189

Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
@aduth aduth merged commit d753d55 into main May 22, 2024
@aduth aduth deleted the aduth-subset-content branch May 22, 2024 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants