-
Notifications
You must be signed in to change notification settings - Fork 12
idDelta computations should be performed module 65536 #72
Comments
I have the impression that cmap 4 is not capable of representing non-BMP glyphs, and cmap 12(?) is prefered in such cases?
Anyway, OTFontFile/Table_cmap.cs Line 697 there abouts on GetIdDelta(uint ...) returns signed short, So you are proposing +/- 65536 around OTFontFileVal/val_cmap.cs#L950 ?
But then does it mean that the look-up would be ambiguous (by 65536), and that goes back to my initial comment: I ha d the impression that cmap 4 is not good for large fonts with non-BMP glyphs?
So I doubt this, and I am not convinced it is a bug - rather, it is a limitation of cmap 4. If the look-up is ambiguous, that's not good; the font should not rely on the font engine to make guesses. But, please feel free to continue.
|
This is not related to the BMP restriction. If the start code is 1 but this is mapped to glyph ID 32769 the same problem applies. This is about large glyph IDs, not large codepoint IDs. The issue comes when the difference between the codepoint and the glyph ID > 32767. |
It is somewhat related to BMP restriction. I think you are trying to convince me that it is possible to encode quite arbitrarily large glyph id this way, i.e. startcode =10000, idDelta is -32767 , means 65536 + 10000 - 32767 = 42769 .
But why stop at 42769? What about glyph id 2 * 65536 + 10000 - 32767 = 108305 , or 108305 + 65536 = 173841 ?
All three glyph ids: 42769, 108305, 173841 , are valid interpretation under this idea. Hence the ambiguity.
You are saying that the number should always be converted to a positive number (by adding/substracting multiples of 65536). I am saying that maybe you shouldn't be using cmap 4, if you have rather large glyph ids.
My impression is that when some fonts ship both cmap 4 and cmap 12, many glyphs are missing from the cmap 4 table?
|
I don't claim to understand that part of code to be honest. But I think the intention is to encode contiguous ranges, with startcode being uint16 ( 0 to 65535 ) and iddelta being int16 ( - 32768 to 32767) to be able to encode glyph id ranges up to about 98300 ? It is still not possible to go beyond 98300 there abouts in this scenario. I think the current logic, that startcode needs to be larger than the absolute size of iddelta to make the sum +ve, "is not obviously wrong".
In fact the modulo 65536 is possibly a not very clear way of commenting on the possibility of overflowing in either direction of adding a uint16 to an int16 to get a uint16 (ie with cmap 4 you are probably supposed to stay under 65535, instead of under 98300). If you do 65535 + 32767, do you minus 65536 to get an id below 65535?
|
No, I am not saying this. The specification is saying this.
You are saying this, but the specification is not saying this. My suggestion is that FontValidator follows the specification, not what you are saying or what I am saying. |
Guess we disagree on how we read the spec. So, I personally am not going to work on this, but please feel free to do a pull for further discussion.
|
Fair enough. I am not going to work on this either, then. Just out of interest, how do you understand "the idDelta arithmetic is modulo 65536", and do you think the idDelta arithmetic in FontValidator is being done modulo 65536? |
I read the statement as 'values are restricted to within 0-65536 (outside 0-65536 are invalid)', instead of 'values outside 0-65536 are automatically wrapped around'.
|
idDelta did indeed used to be specified as a ushort, but now it is specified as a short, so values < 0 are valid. |
In the FontVal code, GetIdDelta() returns a signed value, so that's not the dispute. As I mentioned earlier, the trouble is interpreting maths when adding signed value to an unsigned StartCode, to get a supposedly unsigned glyph id.
Anyway, if wrapped around is intented, you have a problem of having large startcode (50000, say,), a large +ve idDelta (20000, say). Does it means glyph id 70000, or 70000 - 65536 = 4464, one of the lower numbered glyph id's, for a font having 70000+ glyphs ? For that reason, I think this cmap is not intented for large fonts, so later formats were added. Basically thesame thing I mentioned at the beginning.
|
Fair enough. You're the boss! |
I am okay to see a pull for further discussion, but not going to spend time myself on it. In principle, adding an signed short idDelta to an unsigned short startcode can be up to ~98300 , I think the modulo statement rather says this cmap format is not meant to cope with glyph id higher than 65536.
The other issue is, named '...Delta' tends to be a small change, positive or negative, to another value (startcode), which it is. Having a large ...delta against a small value seems contrary to the naming's intention.
I assume you came across it because it is convenient for some font writing software to generate such values. Based on the idea that "delta"'s are supposed to be small, I'd rather see that font writing software changed, even if the generated fonts 'happen to work okay on platform xyz'. Anyway, I am wonder why this (unknown?) software would generate large "delta"'s.
|
This would be an incorrect reading. The text is very explicitly "The idDelta arithmetic is modulo 65536" meaning that the result of performing the idDelta calculation must be brought back into the [0, 65536] interval if the resulting value is greater than 65535 by applying the modulo operation. That is: (Remember that we're doing this because glyphIDs are USHORT values, and so cannot be anything other than a value between 0 inclusively and 65536 exclusively, but cmap segment values can be values out side that range in order to optimize font size on disk and/or byte sequence compressibility) |
While debating interpretation of the spec is one aspect of this issue, there is a meta-problem here: if any part is ambiguous or subjected to possible mis-interpretation, a good font checker should warn about it, regardless of which side the opinion one sits on. |
So the fact that debates exist at all, means the warning should stay. |
At the same time: if there is an ambiguity, please report that to the spec maintainers so they can remove that ambiguity. Requesting the explicit formula be added to the spec to make it crystal clear is a feedback click away. I've filed MicrosoftDocs/typography-issues#980 on the spec side to get this made explicit. |
While theorectically, the maths should be done by up-casting to a larger container, then do the modulo, I suspect this part is documenting the limitation of the first/reference implementation. It assumes numbers to overflow in quite specific ways. Especially with signed/unsigned differences involved too. In that regard, I am reminded of a recent change in freetype I happened to notice: instead of doing (a+b)/2 for the average of two numbers, there is a recent change in some part to change it to a + (b-a)/2 . Obviously theoretically they are the same... The idea seems to be that, if a and b are similar, (b-a) would be small and therefore when a and b are both close to 1/2 max, it is better to add a small difference to one of them, than sum two large numbers and divide. When signed/unsigned changes are involved too, it is not a good idea to assume maths will overflow and wrap around in specific ways. Having a formula may or may not help. Do you think (a+b)/2 and a +(b-a)/2 are they same? They are, in the theoretical sense, but not in practical sense. |
While an excellent anecdote, that feels unrelated to this specific issue, where the calculation has a formal definition that includes the module requirement. Because no, those to formulae are not the same, but they are when we add a modulo requirement. Given In this case the weirdness comes from
The idDelta bit pattern is the same, and the result typing is still the same, so the modulo fixes over/underflow and the result is identical. The modulo operation already fixed overflow before, and now it also fixes underflow. The modulo operation forces the result to stay good (and we could even wonder why the hell we changed the spec, because thanks to that modulo requirement literally nothing changes other than to the squishy brains looking at the idDelta number when it's printed as a numerical string) |
3 typos in your expressions (corrected below),
|
Ah, thanks, those were very silly typos. If you use an intermediary type that requires a rounding policy, then yeah: you're going to have different functions given different results depending on the rounding policy chosen. But then we've firmly left the analogy and started discussing unrelated maths, because that is simply not at play for |
Hmmm... Had to dig a bit for that. The TrueType 1.66 spec did use USHORT, and that was in an OpenType 0.2 draft, but in OpenType 1.01 it was corrected to SHORT. Apple's spec, oddly, still documents as USHORT, but then goes on to provide an example (shorter version of the example given in the OT spec) in which idDelta values are negative. That same example with negative values was used in OT 0.2 and in TT 1.66, so clearly SHORT is what was intended all along. But as noted above, as long as modulo 65536 arithmetic is used (equivalently, values are wrapped on overflow), then USHORT also works. (But then examples shouldn't be showing negative values.) |
I take that back. After investigating further, the intent all along was USHORT and assuming specific behaviour in the C language for arithmetic operations when both operands are unsigned: that overflow behaviour is to wrap. (See also other comments added here.) The example with negative delta values goes back to Apple's original "Royal" spec (what became TrueType). What was missing that might have been helpful was to clarify that large delta values, when added to a given value, can have the same effect as a small negative value due to the modulo / wrapping behaviour of the arithmetic. |
The long and short of it (sorry for the pun) is that what @simoncozens and @Pomax have been saying on this issue is correct. This is not about supporting characters above the BMP; it is not about glyph IDs >64K. The delta can be +ve or -ve depending on whether the character ID > glyph ID or vice versa; but after the addition, the glyph ID must be >= 0. The original TT spec used uint16, but that depended on specific details about unsigned integer overflow in C language specification to acheive the equivalent of a -ve delta for the case when char ID > glyph ID. The OT spec used int16 to more clearly reflect the intent of the delta without depending on C specification details about integer overflow, but needed to add that arithmetic must be modulo 65536 to ensure the result is > 0. Font Validator should not be raising an error (or warning) if startCode + idDelta is < 0. Rather, it should be computing |
... which it will be, by definition -- no? |
I would say this is probably a good issue for somebody else (especially Microsoft folks) to do a pull to further discussion, and also for somebody else to get familiar with the code and building it, etc. Also need direct reference to a font/ and exact revision which triggers this behavior. (I realise this info might be buried in one of the other referenced issues elsewhere, but we need a direct reference, really). If I had any time at all on FontVal , it would be spent on FontVal-RX (of having multiple rendering backends, and re-enabling a Microsoft proprietary binary-only rasterizer) so my time is very limited on anything else. |
What purpose would further discussion serve? We've pretty much all told you what needs to happen. There's nothing left to discuss.
Time to archive the repo, then, or hand on maintainership? |
Yes, archiving the repo has crossed my mind. I'd prefer not to spend time have to engage/explain why I am doing that, so do nothing (and just leave this idle quietly) is my position, as I have been. So. |
Asking for a pull is, asking somebody else to do the testing for the suggested change, etc. |
It would be a good idea to at least add the "help wanted" label to this issue, so people know that help is wanted. Might not do anything, but no one will know you want help (or someone to take over) if the repo doesn't call it out. Similarly, adding a bit to the README.md going "this repo is in search of a maintainer" should be a quick "can be done purely on github itself" update that would go a long way towards making it clear to folks that you (understandably) have zero time to dedicate to this project anymore. (I wouldn't archive it, myself, mostly because that also disables issues, PRs, etc. which wouldn't be great even if no one touches the code ever again. Having those stick around makes any potential future forks much easier) |
"Help-wanted" label added. In general, my inclination is "do nothing", and that includes making any statements about plans or lack of, or changing the README, or respond to flame. Just absolutely nothing. For this particular issue, since the location of code is identified, it is a good "starter" issue for somebody else who want to contribute with pull. That somebody else is also responsible for doing some testing, with an appropriate font sample. Given the nature of the issue - it sounds a lot like the spec is documenting a quirk/limitation/specific-behavior of the (historical/old) de facto Apple/Microsoft implementation, the suggestion that "it works in win11/win12, current freetype..." really isn't interesting, but that it might misbehave on something old and still in common use, like XP or the font renderer in Java runtime 8. So investigating on those, and/or actual access to relevant part of older Apple/Microsoft source code, is preferred. Anyway, enough time spent on "statements" already. Back to silent mode. |
When validating a cmap 4 subtable, (a) idDelta is a signed short ranging from -32768 to 32767, and (b) the spec says that
So if, for example, the startCode is 10000, and the desired glyphId is 42769, the correct value of idDelta is -32767 because
(10000 + -32767) % 65536 == 42769
.Font-Validator adds the idDelta to the startCode and checks if this is less than zero:
Font-Validator/OTFontFileVal/val_cmap.cs
Line 950 in b8394cb
However it does not perform the modulo arithmetic. It does
10000 + -32767
, gets a negative number, and erroneously raises an error.See fonttools/fonttools#2575 for more details.
The text was updated successfully, but these errors were encountered: