Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

Font metrics to report blanks in CJK punctuation #45

Open
frivoal opened this issue Feb 2, 2021 · 4 comments
Open

Font metrics to report blanks in CJK punctuation #45

frivoal opened this issue Feb 2, 2021 · 4 comments
Labels
i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@frivoal
Copy link
Collaborator

frivoal commented Feb 2, 2021

Is there in opentype (and related technologies) some metric that lets you know for CJK characters like , or others that contain a large blank, how much of the characters is blank (and one which side(s) the blank is)?

As a first approximation, we could write a fixed list like (expanded, this is not exhaustive):

  • the right 50% of , is blank
  • all of the ideographic space is blank
  • the left 25% and the right 25% of are blank

However:

  • this varies per language (since things like and are centered in traditional Chinese)
  • this varies between horizontal and vertical writing modes
  • this varies per font (some center , some left-align it leaving a blank right)
  • etc

So this doesn't seem like something we can hard-code in Unicode (or in a spec that wants to use these things), and that it has to be readable from font metrics.

So, does this information exist already, and if not, can it be created?

I suspect that the halt and vhal features are not what I am looking for, as they fit a fullwidth character into a halfwidth space, which does get you rid of the blank part, but doesn't tell you where that blank part was, nor guarantee that the non-blank part is undistorted by this operation.

Use cases for this would be supporting ruby overhand in the manner described in Simple ruby 3.2 bullet 4, or https://drafts.csswg.org/css-text-4/#fullwidth-collapsing

@lianghai
Copy link
Contributor

lianghai commented Feb 3, 2021

  • this varies per font (some center ?, some left-align it leaving a blank right)

Yep, eg, fonts from Founder Type, the largest type foundry in the mainland China, notoriously have their and etc centered.

No, there isn’t a specialized field available there today in the fonts, but dynamically checking side bearings may be easier for everyone anyway. You know how we can’t even trust those metrics in Latin fonts for underlines.

@macnmm
Copy link

macnmm commented Mar 5, 2021

I have something in mind for this that I have floated with Koji-san in a thread and could write up for a more formal proposal...

Introduce two new OTF features 'jtsu/vjts' and 'jaki/vjak'; deprecate the proposed features 'chws/vchw':

The issue my idea is trying to solve is that currently the JIS X 4051 standard dictates certain full-width punctuation be treated as half-width, but fonts today include many more punctuation characters than are explicitly specified, and also do not agree on the width and design for codepoints in the ambiguous range (e.g. U+2xxx). So, what started as a spec for width adjustments based on codepoint actually require font and glyph info to do correctly. The 'jtsu/vjts' feature would be a way for fonts to specify this adjustment to the JIS X 4051 zero-point, eliminating differences across text engines that hard-code their own behavior or have unique heuristics. I have chosen 'j' as the prefix (and base the 3-char name on Japanese terminology "aki" and "tsume") because this issue is specific to the Japanese JIS standard and its unique scheme.

Having established a way for the JIS X 4051 zero-point to be informed by the font data, there is the issue of varying levels of support for mojikumi aki spacing adjustments away from the JIS X 4051 zero-point. Some engines may only require a basic adjustment of +1/2 em all the time; InDesign requires much more variation and control of when and how much is added. For the basic case, performing the +1/2 em logic in a contexutal way could be done using an OTF feature, hence the introduction of the 'jaki/vjak' feature. Engines will always be different in how they adjust spacing; they could use OTF features or they could have their own more complex logic. I don't think the use of either feature would be inconsistent, only that some apps would opt not to use the second one.

@xfq xfq added i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Mar 8, 2021
@frivoal
Copy link
Collaborator Author

frivoal commented Apr 20, 2021

Actually, I think I was mistaken / confused, and that halt actually is what I am looking for. I think I actually confused it with hwid. That would have the issues I am talking about, but halt seems fine.

But now that @lianghai and @macnmm have agreed there's an issue, I suspect there's more complexity to this problem that I initially thought, but I'm not sure what's missing. @macnmm Could you explain a bit more (we touched about this in the latest JLREQ-TF, but I'm still falling a bit short)?

@macnmm
Copy link

macnmm commented May 20, 2021

Actually, I think I was mistaken / confused, and that halt actually is what I am looking for. I think I actually confused it with hwid. That would have the issues I am talking about, but halt seems fine.

But now that @lianghai and @macnmm have agreed there's an issue, I suspect there's more complexity to this problem that I initially thought, but I'm not sure what's missing. @macnmm Could you explain a bit more (we touched about this in the latest JLREQ-TF, but I'm still falling a bit short)?

I think that if you are wanting all the full width punctuation to be reduced to half-width for layout purposes, halt will do the trick. If you want to know where the space is in the glyph design, using halt to back into it with math seems expensive to me, but it could be done.

My issue is (and I hijacked your issue being on the agenda to discuss it), chws is not usable by sophisticated layout apps, as it basically performs mojikumi-like spacing adjustments on punctuation when they are contiguous, but otherwise does no other adjustment so would be incompatible with other adjustment logic. I seek a feature that can improve upon the issue of mojikumi class being tied to the glyph and not the codepoint, yet we have no standard way of categorizing glyphs for this purpose. I thought we could use halt to achieve the JIS X 4051 zero-point, and from there unsophisticated apps would use a new feature jaki to add back aki spacing in the case the punctuation is adjacent to a glyph of a different class. Sophisticated apps would have their own logic to do this and not use the jaki feature.

However in subsequent discussions it appears chws still has its adherents, for the simplest use cases of single stylerun UI or other text, and the JLReq TF is discussing the need for fonts to include character class info somehow for glyphs, so engines can do sophisticated layout. To what extent should the fonts specify spacing, in a world where spacing is not one-size-fits-all? Perhaps fonts should instead specify class, and engines decide to what extent they support mojikumi...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
i18n-clreq Chinese language enablement i18n-jlreq Japanese language enablement i18n-klreq Korean language enablement i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests

4 participants