Fix space handling #912

edemaine · 2017-10-01T18:28:16Z

This PR fixes several issues with space handling, in particular fixing #910:

"Control symbols" (as they're called in the TeXbook), such as \\, should
not have spaces eaten after them (only "control words" such as \foo).
In math mode, spaces should be consumed at the parser level, not the
gullet level. This enables \\ [x] to parse differently from \\[x]
Eat spaces between arguments, so \frac x y still works.
(This used to work only because math mode ate all spaces.
The analog in text mode wouldn't have worked.)

Notably, Parser.consume no longer eats spaces when in math mode; rather, I get rid of them in parseExpression (which seems more natural -- though there is a worry that some code somewhere assumed that spaces would already be consumed). This implies that we never use MacroExpander.get with a true argument, which could simplify the code of both get and unget. It also means that switchMode no longer does anything useful. Is it worth removing any of this code? I'm unsure.

Fixes several issues with space handling: 1. "Control symbols" (as they're called in the TeXbook), such as `\\`, should not have spaces eaten after them (only "control words" such as `\foo`). 2. In math mode, spaces should be consumed at the parser level, not the gullet level. This enables `\\ [x]` to parse differently from `\\[x]` 3. Eat spaces between arguments, so `\frac x y` still works. (This used to work only because math mode ate all spaces. The analog in text mode wouldn't have worked.)

kevinbarabash · 2017-10-03T16:59:32Z

It looks like there's an issue with the Sqrt screenshot test. I'm guessing whatever the issue is with that test, it's probably something that we could write a unit test for.

* Add atom test. * Also use consumeSpaces helper more.

edemaine · 2017-10-03T17:41:14Z

@kevinbarabash Thanks for spotting that. Indeed, it was a bug, and I added a test for it. Spaces needed to be checked for (and ignored) also in the sup/subscript handling of parseAtom. I fixed it, and added a test that failed otherwise. I've searched for other places where spaces need to be ignored, but I haven't found any...

kevinbarabash · 2017-10-08T02:38:00Z

src/Lexer.js

-const commentRegex = new RegExp(commentRegexString);
+// tokenRegex has no ^ marker, as required by matchAt.
+// These regexs are for matching results from tokenRegex,
+// so they do have ^ markers.


nice comment

kevinbarabash · 2017-10-08T02:48:34Z

src/Parser.js

@@ -140,9 +140,13 @@ export default class Parser {
     * and fetches the one after that as the new look ahead.
     */
    consume() {
-        this.nextToken = this.gullet.get(this.mode === "math");
+        this.nextToken = this.gullet.get(false);


Since this is the only place where gullet.get is called and its parameter is always the same, maybe we should get rid of the parameter.

Yes, I wanted a second opinion on this. As I wrote above: "we never use MacroExpander.get with a true argument, which could simplify the code of both get and unget." We no longer need any of the space saving/restoring mechanics, so I'll get rid of that.

"It also means that switchMode no longer does anything useful." (in the parser) This one I'm less sure of. Maybe switchMode would be useful in the future, e.g. if we can ever tweak catcodes in other ways? (e.g. verbatim or url modes?)

If you think switchMode will come in handy in the future let's keep it, but please add a comment as to why we're keeping it.

I think we still need switchMode because we store the current mode in the parse nodes and that value gets used in the builders. Ignore my comment about adding a comment.

kevinbarabash · 2017-10-08T02:55:43Z

test/katex-spec.js

@@ -2353,6 +2355,11 @@ describe("An aligned environment", function() {
            .toParse();
    });

+    it("should allow cells in brackets", function() {
+        expect("\\begin{aligned}[a]&[b]\\\\ [c]&[d]\\end{aligned}")
+            .toParse();


And if there's no space between \\\\and [c] would it try to parse [c] as a measurement?

Yes. This matches LaTeX behavior, based on testing. (\@ifnextchar[ must get a space instead of a [.) I'll add an error test for the no-space case.

kevinbarabash · 2017-10-08T03:19:39Z

test/katex-spec.js

+    });
+
+    it("should consume spaces after control-word function", function() {
+        compareParseTree("\\text{\\KaTeX x}", "\\text{\\KaTeX\\relax x}");


So these two things parse the same, but I thought control words weren't allowed inside \text{}. Why is it important that these two parse the same?

I wanted to make sure that \text{\KaTeX x} does not render a space. Ideally I'd say that \text{\KaTeX x} renders the same as \text{{\KaTeX}x} but that generates another group... perhaps I should tweak the test to actually look for features in the parse tree. (Control words like \KaTeX definitely work in text mode.) Ah, I can just test for \text{\KaTeX } vs. \text{\KaTeX}.

I got an error when I tried \text{\KaTeX} on the demo page so maybe it's just some control words don't work.

kevinbarabash · 2017-10-08T03:23:48Z

test/katex-spec.js

    it("should allow for empty macro argument", function() {
        compareParseTree("\\foo\\bar", "()", {
            "\\foo": "(#1)",
            "\\bar": "",
        });
    });

+    // The following is not currently possible to get working, given that
+    // functions and macros are dealt with separately.


What's the current behavior? Can you open an issue for this?

Opened #924. Also added some more comments in the code about this.

edemaine · 2017-10-09T16:06:13Z

Thanks for the review! All comments should be addressed in the two new commits.

kevinbarabash

LGTM. Thanks for the additional tests and comments.

Continuation of KaTeX#912

edemaine · 2017-10-10T14:25:57Z

Oops, I merged this before simplifying get. Perhaps useful to look at the diff separately anyway. It's in a new PR: #928.

* Simplify get() now that we don't need it to ignorespaces Continuation of #912 * Remove commented-out code * Drop get() alias, rename unget() to pushToken(), use it

edemaine mentioned this pull request Oct 1, 2017

Newline (\\) eats non-size argument surrounded by brackets #910

Open

Fix and test space handling in atoms

33e7ef7

* Add atom test. * Also use consumeSpaces helper more.

kevinbarabash reviewed Oct 8, 2017

View reviewed changes

edemaine added 2 commits October 9, 2017 11:44

Tweak tests according to @kevinbarabash's comments

5af2fe4

Consume initial spaces in arguments in math mode

5587386

edemaine mentioned this pull request Oct 9, 2017

Difficult interaction between MacroExpander and Parser #924

Closed

kevinbarabash approved these changes Oct 10, 2017

View reviewed changes

Merge branch 'master' into space

f479f1c

edemaine merged commit 3280652 into KaTeX:master Oct 10, 2017

edemaine added a commit to edemaine/KaTeX that referenced this pull request Oct 10, 2017

Simplify get() now that we don't need it to ignorespaces

43d6bb0

Continuation of KaTeX#912

edemaine mentioned this pull request Oct 10, 2017

Simplify get() now that we don't need it to ignorespaces #928

Merged

edemaine mentioned this pull request Oct 16, 2017

Implemented `\href' command #923

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix space handling #912

Fix space handling #912

edemaine commented Oct 1, 2017 •

edited

Loading

kevinbarabash commented Oct 3, 2017

edemaine commented Oct 3, 2017

kevinbarabash Oct 8, 2017

kevinbarabash Oct 8, 2017

edemaine Oct 9, 2017

kevinbarabash Oct 10, 2017

kevinbarabash Oct 10, 2017 •

edited by edemaine

Loading

kevinbarabash Oct 8, 2017

edemaine Oct 9, 2017 •

edited

Loading

kevinbarabash Oct 8, 2017

edemaine Oct 9, 2017 •

edited

Loading

kevinbarabash Oct 10, 2017

kevinbarabash Oct 8, 2017

edemaine Oct 9, 2017 •

edited

Loading

edemaine commented Oct 9, 2017

kevinbarabash left a comment

edemaine commented Oct 10, 2017

Fix space handling #912

Fix space handling #912

Conversation

edemaine commented Oct 1, 2017 • edited Loading

kevinbarabash commented Oct 3, 2017

edemaine commented Oct 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinbarabash Oct 10, 2017 • edited by edemaine Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine Oct 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine Oct 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edemaine Oct 9, 2017 • edited Loading

Choose a reason for hiding this comment

edemaine commented Oct 9, 2017

kevinbarabash left a comment

Choose a reason for hiding this comment

edemaine commented Oct 10, 2017

edemaine commented Oct 1, 2017 •

edited

Loading

kevinbarabash Oct 10, 2017 •

edited by edemaine

Loading

edemaine Oct 9, 2017 •

edited

Loading

edemaine Oct 9, 2017 •

edited

Loading

edemaine Oct 9, 2017 •

edited

Loading