Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix space handling #912

Merged
merged 5 commits into from
Oct 10, 2017
Merged

Fix space handling #912

merged 5 commits into from
Oct 10, 2017

Conversation

edemaine
Copy link
Member

@edemaine edemaine commented Oct 1, 2017

This PR fixes several issues with space handling, in particular fixing #910:

  1. "Control symbols" (as they're called in the TeXbook), such as \\, should
    not have spaces eaten after them (only "control words" such as \foo).
  2. In math mode, spaces should be consumed at the parser level, not the
    gullet level. This enables \\ [x] to parse differently from \\[x]
  3. Eat spaces between arguments, so \frac x y still works.
    (This used to work only because math mode ate all spaces.
    The analog in text mode wouldn't have worked.)

Notably, Parser.consume no longer eats spaces when in math mode; rather, I get rid of them in parseExpression (which seems more natural -- though there is a worry that some code somewhere assumed that spaces would already be consumed). This implies that we never use MacroExpander.get with a true argument, which could simplify the code of both get and unget. It also means that switchMode no longer does anything useful. Is it worth removing any of this code? I'm unsure.

Fixes several issues with space handling:
1. "Control symbols" (as they're called in the TeXbook), such as `\\`, should
   not have spaces eaten after them (only "control words" such as `\foo`).
2. In math mode, spaces should be consumed at the parser level, not the
   gullet level.  This enables `\\ [x]` to parse differently from `\\[x]`
3. Eat spaces between arguments, so `\frac x y` still works.
   (This used to work only because math mode ate all spaces.
    The analog in text mode wouldn't have worked.)
@kevinbarabash
Copy link
Member

It looks like there's an issue with the Sqrt screenshot test. I'm guessing whatever the issue is with that test, it's probably something that we could write a unit test for.

* Add atom test.
* Also use consumeSpaces helper more.
@edemaine
Copy link
Member Author

edemaine commented Oct 3, 2017

@kevinbarabash Thanks for spotting that. Indeed, it was a bug, and I added a test for it. Spaces needed to be checked for (and ignored) also in the sup/subscript handling of parseAtom. I fixed it, and added a test that failed otherwise. I've searched for other places where spaces need to be ignored, but I haven't found any...

const commentRegex = new RegExp(commentRegexString);
// tokenRegex has no ^ marker, as required by matchAt.
// These regexs are for matching results from tokenRegex,
// so they do have ^ markers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comment

@@ -140,9 +140,13 @@ export default class Parser {
* and fetches the one after that as the new look ahead.
*/
consume() {
this.nextToken = this.gullet.get(this.mode === "math");
this.nextToken = this.gullet.get(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the only place where gullet.get is called and its parameter is always the same, maybe we should get rid of the parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted a second opinion on this. As I wrote above: "we never use MacroExpander.get with a true argument, which could simplify the code of both get and unget." We no longer need any of the space saving/restoring mechanics, so I'll get rid of that.

"It also means that switchMode no longer does anything useful." (in the parser) This one I'm less sure of. Maybe switchMode would be useful in the future, e.g. if we can ever tweak catcodes in other ways? (e.g. verbatim or url modes?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think switchMode will come in handy in the future let's keep it, but please add a comment as to why we're keeping it.

Copy link
Member

@kevinbarabash kevinbarabash Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need switchMode because we store the current mode in the parse nodes and that value gets used in the builders. Ignore my comment about adding a comment.

@@ -2353,6 +2355,11 @@ describe("An aligned environment", function() {
.toParse();
});

it("should allow cells in brackets", function() {
expect("\\begin{aligned}[a]&[b]\\\\ [c]&[d]\\end{aligned}")
.toParse();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if there's no space between \\\\and [c] would it try to parse [c] as a measurement?

Copy link
Member Author

@edemaine edemaine Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This matches LaTeX behavior, based on testing. (\@ifnextchar[ must get a space instead of a [.) I'll add an error test for the no-space case.

});

it("should consume spaces after control-word function", function() {
compareParseTree("\\text{\\KaTeX x}", "\\text{\\KaTeX\\relax x}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these two things parse the same, but I thought control words weren't allowed inside \text{}. Why is it important that these two parse the same?

Copy link
Member Author

@edemaine edemaine Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make sure that \text{\KaTeX x} does not render a space. Ideally I'd say that \text{\KaTeX x} renders the same as \text{{\KaTeX}x} but that generates another group... perhaps I should tweak the test to actually look for features in the parse tree. (Control words like \KaTeX definitely work in text mode.) Ah, I can just test for \text{\KaTeX } vs. \text{\KaTeX}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an error when I tried \text{\KaTeX} on the demo page so maybe it's just some control words don't work.

it("should allow for empty macro argument", function() {
compareParseTree("\\foo\\bar", "()", {
"\\foo": "(#1)",
"\\bar": "",
});
});

// The following is not currently possible to get working, given that
// functions and macros are dealt with separately.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current behavior? Can you open an issue for this?

Copy link
Member Author

@edemaine edemaine Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #924. Also added some more comments in the code about this.

@edemaine
Copy link
Member Author

edemaine commented Oct 9, 2017

Thanks for the review! All comments should be addressed in the two new commits.

Copy link
Member

@kevinbarabash kevinbarabash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the additional tests and comments.

@edemaine edemaine merged commit 3280652 into KaTeX:master Oct 10, 2017
edemaine added a commit to edemaine/KaTeX that referenced this pull request Oct 10, 2017
@edemaine
Copy link
Member Author

Oops, I merged this before simplifying get. Perhaps useful to look at the diff separately anyway. It's in a new PR: #928.

kevinbarabash pushed a commit that referenced this pull request Oct 15, 2017
* Simplify get() now that we don't need it to ignorespaces

Continuation of #912

* Remove commented-out code

* Drop get() alias, rename unget() to pushToken(), use it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants