Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct handling of unbraced kerns followed by spaces. #751

Merged
merged 1 commit into from
Jun 30, 2017

Conversation

kohler
Copy link
Collaborator

@kohler kohler commented Jun 30, 2017

Example: a\mkern-1mu b should be parsed like a\mkern{-1mu}b. It wasn't, now it is.

Did not realize that Parser.nextToken.text can contain spaces (it can). Handle that.

This may allow, for instance, #727 to use the actual TeX definitions, rather than adding spurious braces.

Copy link
Member

@kevinbarabash kevinbarabash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Small nit regarding how we consume spaces.

src/Parser.js Outdated
@@ -789,7 +789,7 @@ Parser.prototype.parseSizeGroup = function(optional) {
let res;
if (!optional && this.nextToken.text !== "{") {
res = this.parseRegexGroup(
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}$/, "size");
/^[-+]? *(?:$|\d+|\d+\.\d*|\.\d*) *[a-z]{0,2}\s*$/, "size");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use this.consumeSpaces(); in other places. It might be good to do the same here for consistency. It should be noted that consumeSpaces does not consume all whitespace, only plain old spaces. I wonder if it should be upgrade?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to call consumeSpaces() here, this is before the tokens are created. It would need to be move into this.parseRegexGroup. And that would be a bigger project: ti would change the meaning. Please take commit as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this \s* interact with newlines? I suppose KaTeX doesn't support the double newline -> \par conversion, but if it did, I suppose this regex might cause trouble...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @kohler on consumeSpaces -- that's only for when spaces have already been parsed by the tokenizer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid further discussion on hypothetical future problems or inconsistencies, I will make the regex spaces only. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And spaces-only is certainly more consistent with the rest of the regex! Thanks for pushing further.

Copy link
Member

@edemaine edemaine Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the behavior you've implemented with \s* is more consistent with the rest of KaTeX, which treats \n and spaces identically. (just did some tests on https://khan.github.io/KaTeX/ ) I think it would be confusing for \kern 1em\n... to fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually argue for \s* in both places. The following renders like x\kern 1em y in LaTeX.

x\kern
1em
y

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but whitespace conversion is already handled by KaTeX's lexer, which changes all [ \r\n\t]+ sequences to a single space character.

New commit has test making this clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice! (And that's where double-newline handling would go.)

@@ -1037,17 +1040,42 @@ describe("A non-braced kern parser", function() {
const emKern = "\\kern1em";
const exKern = "\\kern 1 ex";
const muKern = "\\kern 1mu";
const abKern1 = "a\\mkern1mub";
const abKern2 = "a\\kern-1mub";
const abKern3 = "a\\kern-1mu b";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for filling out the tests. Only abKern3 would fail without the changes to Parser.js.

@kevinbarabash kevinbarabash self-assigned this Jun 30, 2017
@edemaine
Copy link
Member

I didn't even realize that unbraced size arguments were supported by KaTeX! (because I always added extra spaces, it seems) Thanks for fixing this.

Did not realize that `Parser.nextToken.text` can contain spaces
(it can). Handle that.
@edemaine
Copy link
Member

I approve this PR, but I'll leave it to @kevinbarabash to confirm and push the button.

@kevinbarabash
Copy link
Member

@kohler thanks for the changes. @edemaine thanks for looking this over. Always good to have a second pair of eyes.

@kevinbarabash kevinbarabash merged commit 87b9123 into KaTeX:master Jun 30, 2017
@kohler kohler deleted the fix-unbraced-kern branch June 30, 2017 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants