Skip to content
This repository has been archived by the owner on Oct 18, 2023. It is now read-only.

idDelta computations should be performed module 65536 #72

Open
simoncozens opened this issue Apr 5, 2022 · 30 comments
Open

idDelta computations should be performed module 65536 #72

simoncozens opened this issue Apr 5, 2022 · 30 comments

Comments

@simoncozens
Copy link

When validating a cmap 4 subtable, (a) idDelta is a signed short ranging from -32768 to 32767, and (b) the spec says that

The idDelta arithmetic is modulo 65536.

So if, for example, the startCode is 10000, and the desired glyphId is 42769, the correct value of idDelta is -32767 because (10000 + -32767) % 65536 == 42769.

Font-Validator adds the idDelta to the startCode and checks if this is less than zero:

if (GetIdDelta(i) + GetStartCode(i) < 0)

However it does not perform the modulo arithmetic. It does 10000 + -32767, gets a negative number, and erroneously raises an error.

See fonttools/fonttools#2575 for more details.

@HinTak
Copy link
Owner

HinTak commented Apr 5, 2022 via email

@simoncozens
Copy link
Author

This is not related to the BMP restriction. If the start code is 1 but this is mapped to glyph ID 32769 the same problem applies. This is about large glyph IDs, not large codepoint IDs. The issue comes when the difference between the codepoint and the glyph ID > 32767.

@HinTak
Copy link
Owner

HinTak commented Apr 5, 2022 via email

@HinTak
Copy link
Owner

HinTak commented Apr 6, 2022 via email

@simoncozens
Copy link
Author

You are saying that the number should always be converted to a positive number (by adding/substracting multiples of 65536).

No, I am not saying this. The specification is saying this.

I am saying that maybe you shouldn't be using cmap 4, if you have rather large glyph ids.

You are saying this, but the specification is not saying this.

My suggestion is that FontValidator follows the specification, not what you are saying or what I am saying.

@HinTak
Copy link
Owner

HinTak commented Apr 6, 2022 via email

@simoncozens
Copy link
Author

Guess we disagree on how we read the spec.

Fair enough. I am not going to work on this either, then.

Just out of interest, how do you understand "the idDelta arithmetic is modulo 65536", and do you think the idDelta arithmetic in FontValidator is being done modulo 65536?

@HinTak
Copy link
Owner

HinTak commented Apr 6, 2022 via email

@simoncozens
Copy link
Author

outside 0-65536 are invalid

idDelta did indeed used to be specified as a ushort, but now it is specified as a short, so values < 0 are valid.

@HinTak
Copy link
Owner

HinTak commented Apr 6, 2022 via email

@simoncozens
Copy link
Author

Fair enough. You're the boss!

@HinTak
Copy link
Owner

HinTak commented Oct 11, 2022 via email

@Pomax
Copy link

Pomax commented Oct 11, 2022

I read the statement as 'values are restricted to within 0-65536 (outside 0-65536 are invalid)', instead of 'values outside 0-65536 are automatically wrapped around'.

This would be an incorrect reading. The text is very explicitly "The idDelta arithmetic is modulo 65536" meaning that the result of performing the idDelta calculation must be brought back into the [0, 65536] interval if the resulting value is greater than 65535 by applying the modulo operation.

That is: glyphIndex = (value + idDelta) % 0xFFFF

(Remember that we're doing this because glyphIDs are USHORT values, and so cannot be anything other than a value between 0 inclusively and 65536 exclusively, but cmap segment values can be values out side that range in order to optimize font size on disk and/or byte sequence compressibility)

@HinTak
Copy link
Owner

HinTak commented Oct 11, 2022

While debating interpretation of the spec is one aspect of this issue, there is a meta-problem here: if any part is ambiguous or subjected to possible mis-interpretation, a good font checker should warn about it, regardless of which side the opinion one sits on.

@HinTak
Copy link
Owner

HinTak commented Oct 11, 2022

So the fact that debates exist at all, means the warning should stay.

@Pomax
Copy link

Pomax commented Oct 11, 2022

At the same time: if there is an ambiguity, please report that to the spec maintainers so they can remove that ambiguity. Requesting the explicit formula be added to the spec to make it crystal clear is a feedback click away. I've filed MicrosoftDocs/typography-issues#980 on the spec side to get this made explicit.

@HinTak
Copy link
Owner

HinTak commented Oct 12, 2022

While theorectically, the maths should be done by up-casting to a larger container, then do the modulo, I suspect this part is documenting the limitation of the first/reference implementation. It assumes numbers to overflow in quite specific ways. Especially with signed/unsigned differences involved too.

In that regard, I am reminded of a recent change in freetype I happened to notice: instead of doing (a+b)/2 for the average of two numbers, there is a recent change in some part to change it to a + (b-a)/2 . Obviously theoretically they are the same...

The idea seems to be that, if a and b are similar, (b-a) would be small and therefore when a and b are both close to 1/2 max, it is better to add a small difference to one of them, than sum two large numbers and divide.

When signed/unsigned changes are involved too, it is not a good idea to assume maths will overflow and wrap around in specific ways.

Having a formula may or may not help. Do you think (a+b)/2 and a +(b-a)/2 are they same? They are, in the theoretical sense, but not in practical sense.

@Pomax
Copy link

Pomax commented Oct 12, 2022

While an excellent anecdote, that feels unrelated to this specific issue, where the calculation has a formal definition that includes the module requirement. Because no, those to formulae are not the same, but they are when we add a modulo requirement. Given a and b of same type, (a+b)/2 and a+(b-a)/2 are radically different functions, with wildly different over/underflow behaviour. But: (a+b)/2 % (1<<sizeof(a)) and (a+(b-a)/2) % (1<<sizeof(a)) are identical functions (although you'd be hard pressed to find a codebase where people would bother with that modulo correction outside of aviation)

In this case the weirdness comes from idDelta having changed from a uint16 to an int16, which you can't do unless you know that all the type-constrained mathematical behaviour stays the same. Which it does:

cmap code, uint16 idDelta as uint16 old result, uint16 same idDelta as int16 new result, still uint16
10000 32769 42769 -32767 42769

The idDelta bit pattern is the same, and the result typing is still the same, so the modulo fixes over/underflow and the result is identical. The modulo operation already fixed overflow before, and now it also fixes underflow. The modulo operation forces the result to stay good (and we could even wonder why the hell we changed the spec, because thanks to that modulo requirement literally nothing changes other than to the squishy brains looking at the idDelta number when it's printed as a numerical string)

@HinTak
Copy link
Owner

HinTak commented Oct 12, 2022

3 typos in your expressions (corrected below),

(a+b)/2 % (1<<sizeof(a)) and a+(b-a)/2 % (1<<sizeof(a)) are not the same in practice, even for non-overflowing values. They could differ in the least significant bit, due to underflow. ( (1+0)/2 = 1 or 0 depends on which side you round to...).

@Pomax
Copy link

Pomax commented Oct 13, 2022

Ah, thanks, those were very silly typos.

If you use an intermediary type that requires a rounding policy, then yeah: you're going to have different functions given different results depending on the rounding policy chosen. But then we've firmly left the analogy and started discussing unrelated maths, because that is simply not at play for idDelta validation. There are only additions at play here, no division.

@PeterConstable
Copy link

PeterConstable commented Dec 9, 2022

idDelta did indeed used to be specified as a ushort, but now it is specified as a short, so values < 0 are valid.

Hmmm... Had to dig a bit for that. The TrueType 1.66 spec did use USHORT, and that was in an OpenType 0.2 draft, but in OpenType 1.01 it was corrected to SHORT. Apple's spec, oddly, still documents as USHORT, but then goes on to provide an example (shorter version of the example given in the OT spec) in which idDelta values are negative. That same example with negative values was used in OT 0.2 and in TT 1.66, so clearly SHORT is what was intended all along.

But as noted above, as long as modulo 65536 arithmetic is used (equivalently, values are wrapped on overflow), then USHORT also works. (But then examples shouldn't be showing negative values.)

@PeterConstable
Copy link

... so clearly SHORT is what was intended all along.

I take that back. After investigating further, the intent all along was USHORT and assuming specific behaviour in the C language for arithmetic operations when both operands are unsigned: that overflow behaviour is to wrap.

(See also other comments added here.)

The example with negative delta values goes back to Apple's original "Royal" spec (what became TrueType). What was missing that might have been helpful was to clarify that large delta values, when added to a given value, can have the same effect as a small negative value due to the modulo / wrapping behaviour of the arithmetic.

@PeterCon
Copy link

PeterCon commented Apr 9, 2023

The long and short of it (sorry for the pun) is that what @simoncozens and @Pomax have been saying on this issue is correct.

This is not about supporting characters above the BMP; it is not about glyph IDs >64K. The delta can be +ve or -ve depending on whether the character ID > glyph ID or vice versa; but after the addition, the glyph ID must be >= 0.

The original TT spec used uint16, but that depended on specific details about unsigned integer overflow in C language specification to acheive the equivalent of a -ve delta for the case when char ID > glyph ID. The OT spec used int16 to more clearly reflect the intent of the delta without depending on C specification details about integer overflow, but needed to add that arithmetic must be modulo 65536 to ensure the result is > 0.

Font Validator should not be raising an error (or warning) if startCode + idDelta is < 0. Rather, it should be computing (startCode + idDelta) mod 65536 to ensure that the result is always >= 0.

@bobh0303
Copy link

Font Validator should not be raising an error (or warning) if startCode + idDelta is < 0. Rather, it should be computing (startCode + idDelta) mod 65536 to ensure that the result is always >= 0.

... which it will be, by definition -- no?

@HinTak
Copy link
Owner

HinTak commented Jun 22, 2023

I would say this is probably a good issue for somebody else (especially Microsoft folks) to do a pull to further discussion, and also for somebody else to get familiar with the code and building it, etc.

Also need direct reference to a font/ and exact revision which triggers this behavior. (I realise this info might be buried in one of the other referenced issues elsewhere, but we need a direct reference, really).

If I had any time at all on FontVal , it would be spent on FontVal-RX (of having multiple rendering backends, and re-enabling a Microsoft proprietary binary-only rasterizer) so my time is very limited on anything else.

@simoncozens
Copy link
Author

to do a pull to further discussion

What purpose would further discussion serve? We've pretty much all told you what needs to happen. There's nothing left to discuss.

If I had any time at all on FontVal

Time to archive the repo, then, or hand on maintainership?

@HinTak
Copy link
Owner

HinTak commented Jun 22, 2023

Yes, archiving the repo has crossed my mind. I'd prefer not to spend time have to engage/explain why I am doing that, so do nothing (and just leave this idle quietly) is my position, as I have been. So.

@HinTak
Copy link
Owner

HinTak commented Jun 22, 2023

Asking for a pull is, asking somebody else to do the testing for the suggested change, etc.

@Pomax
Copy link

Pomax commented Jun 22, 2023

It would be a good idea to at least add the "help wanted" label to this issue, so people know that help is wanted. Might not do anything, but no one will know you want help (or someone to take over) if the repo doesn't call it out. Similarly, adding a bit to the README.md going "this repo is in search of a maintainer" should be a quick "can be done purely on github itself" update that would go a long way towards making it clear to folks that you (understandably) have zero time to dedicate to this project anymore.

(I wouldn't archive it, myself, mostly because that also disables issues, PRs, etc. which wouldn't be great even if no one touches the code ever again. Having those stick around makes any potential future forks much easier)

@HinTak
Copy link
Owner

HinTak commented Jun 24, 2023

"Help-wanted" label added. In general, my inclination is "do nothing", and that includes making any statements about plans or lack of, or changing the README, or respond to flame. Just absolutely nothing.

For this particular issue, since the location of code is identified, it is a good "starter" issue for somebody else who want to contribute with pull. That somebody else is also responsible for doing some testing, with an appropriate font sample.

Given the nature of the issue - it sounds a lot like the spec is documenting a quirk/limitation/specific-behavior of the (historical/old) de facto Apple/Microsoft implementation, the suggestion that "it works in win11/win12, current freetype..." really isn't interesting, but that it might misbehave on something old and still in common use, like XP or the font renderer in Java runtime 8. So investigating on those, and/or actual access to relevant part of older Apple/Microsoft source code, is preferred.

Anyway, enough time spent on "statements" already. Back to silent mode.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants