-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct handling of unbraced kerns followed by spaces. #751
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Small nit regarding how we consume spaces.
src/Parser.js
Outdated
@@ -789,7 +789,7 @@ Parser.prototype.parseSizeGroup = function(optional) { | |||
let res; | |||
if (!optional && this.nextToken.text !== "{") { | |||
res = this.parseRegexGroup( | |||
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}$/, "size"); | |||
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}\s*$/, "size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use this.consumeSpaces();
in other places. It might be good to do the same here for consistency. It should be noted that consumeSpaces
does not consume all whitespace, only plain old spaces. I wonder if it should be upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way to call consumeSpaces()
here, this is before the tokens are created. It would need to be move into this.parseRegexGroup
. And that would be a bigger project: ti would change the meaning. Please take commit as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this \s*
interact with newlines? I suppose KaTeX doesn't support the double newline -> \par
conversion, but if it did, I suppose this regex might cause trouble...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @kohler on consumeSpaces
-- that's only for when spaces have already been parsed by the tokenizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid further discussion on hypothetical future problems or inconsistencies, I will make the regex spaces only. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And spaces-only is certainly more consistent with the rest of the regex! Thanks for pushing further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the behavior you've implemented with \s*
is more consistent with the rest of KaTeX, which treats \n
and spaces identically. (just did some tests on https://khan.github.io/KaTeX/ ) I think it would be confusing for \kern 1em\n...
to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually argue for \s*
in both places. The following renders like x\kern 1em y
in LaTeX.
x\kern
1em
y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but whitespace conversion is already handled by KaTeX's lexer, which changes all [ \r\n\t]+
sequences to a single space character.
New commit has test making this clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice! (And that's where double-newline handling would go.)
@@ -1037,17 +1040,42 @@ describe("A non-braced kern parser", function() { | |||
const emKern = "\\kern1em"; | |||
const exKern = "\\kern 1 ex"; | |||
const muKern = "\\kern 1mu"; | |||
const abKern1 = "a\\mkern1mub"; | |||
const abKern2 = "a\\kern-1mub"; | |||
const abKern3 = "a\\kern-1mu b"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for filling out the tests. Only abKern3
would fail without the changes to Parser.js.
I didn't even realize that unbraced size arguments were supported by KaTeX! (because I always added extra spaces, it seems) Thanks for fixing this. |
Did not realize that `Parser.nextToken.text` can contain spaces (it can). Handle that.
07c527e
to
922db7d
Compare
I approve this PR, but I'll leave it to @kevinbarabash to confirm and push the button. |
Example:
a\mkern-1mu b
should be parsed likea\mkern{-1mu}b
. It wasn't, now it is.Did not realize that
Parser.nextToken.text
can contain spaces (it can). Handle that.This may allow, for instance, #727 to use the actual TeX definitions, rather than adding spurious braces.