-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix space handling #912
Fix space handling #912
Conversation
Fixes several issues with space handling: 1. "Control symbols" (as they're called in the TeXbook), such as `\\`, should not have spaces eaten after them (only "control words" such as `\foo`). 2. In math mode, spaces should be consumed at the parser level, not the gullet level. This enables `\\ [x]` to parse differently from `\\[x]` 3. Eat spaces between arguments, so `\frac x y` still works. (This used to work only because math mode ate all spaces. The analog in text mode wouldn't have worked.)
It looks like there's an issue with the |
* Add atom test. * Also use consumeSpaces helper more.
@kevinbarabash Thanks for spotting that. Indeed, it was a bug, and I added a test for it. Spaces needed to be checked for (and ignored) also in the sup/subscript handling of |
const commentRegex = new RegExp(commentRegexString); | ||
// tokenRegex has no ^ marker, as required by matchAt. | ||
// These regexs are for matching results from tokenRegex, | ||
// so they do have ^ markers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment
@@ -140,9 +140,13 @@ export default class Parser { | |||
* and fetches the one after that as the new look ahead. | |||
*/ | |||
consume() { | |||
this.nextToken = this.gullet.get(this.mode === "math"); | |||
this.nextToken = this.gullet.get(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the only place where gullet.get
is called and its parameter is always the same, maybe we should get rid of the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted a second opinion on this. As I wrote above: "we never use MacroExpander.get
with a true
argument, which could simplify the code of both get
and unget
." We no longer need any of the space saving/restoring mechanics, so I'll get rid of that.
"It also means that switchMode
no longer does anything useful." (in the parser) This one I'm less sure of. Maybe switchMode
would be useful in the future, e.g. if we can ever tweak catcodes in other ways? (e.g. verbatim or url modes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think switchMode
will come in handy in the future let's keep it, but please add a comment as to why we're keeping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need switchMode
because we store the current mode in the parse nodes and that value gets used in the builders. Ignore my comment about adding a comment.
@@ -2353,6 +2355,11 @@ describe("An aligned environment", function() { | |||
.toParse(); | |||
}); | |||
|
|||
it("should allow cells in brackets", function() { | |||
expect("\\begin{aligned}[a]&[b]\\\\ [c]&[d]\\end{aligned}") | |||
.toParse(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if there's no space between \\\\
and [c]
would it try to parse [c]
as a measurement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This matches LaTeX behavior, based on testing. (\@ifnextchar[
must get a space instead of a [
.) I'll add an error test for the no-space case.
test/katex-spec.js
Outdated
}); | ||
|
||
it("should consume spaces after control-word function", function() { | ||
compareParseTree("\\text{\\KaTeX x}", "\\text{\\KaTeX\\relax x}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these two things parse the same, but I thought control words weren't allowed inside \text{}
. Why is it important that these two parse the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make sure that \text{\KaTeX x}
does not render a space. Ideally I'd say that \text{\KaTeX x}
renders the same as \text{{\KaTeX}x}
but that generates another group... perhaps I should tweak the test to actually look for features in the parse tree. (Control words like \KaTeX
definitely work in text mode.) Ah, I can just test for \text{\KaTeX }
vs. \text{\KaTeX}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error when I tried \text{\KaTeX}
on the demo page so maybe it's just some control words don't work.
test/katex-spec.js
Outdated
it("should allow for empty macro argument", function() { | ||
compareParseTree("\\foo\\bar", "()", { | ||
"\\foo": "(#1)", | ||
"\\bar": "", | ||
}); | ||
}); | ||
|
||
// The following is not currently possible to get working, given that | ||
// functions and macros are dealt with separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the current behavior? Can you open an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #924. Also added some more comments in the code about this.
Thanks for the review! All comments should be addressed in the two new commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the additional tests and comments.
Continuation of KaTeX#912
Oops, I merged this before simplifying |
* Simplify get() now that we don't need it to ignorespaces Continuation of #912 * Remove commented-out code * Drop get() alias, rename unget() to pushToken(), use it
This PR fixes several issues with space handling, in particular fixing #910:
\\
, shouldnot have spaces eaten after them (only "control words" such as
\foo
).gullet level. This enables
\\ [x]
to parse differently from\\[x]
\frac x y
still works.(This used to work only because math mode ate all spaces.
The analog in text mode wouldn't have worked.)
Notably,
Parser.consume
no longer eats spaces when in math mode; rather, I get rid of them inparseExpression
(which seems more natural -- though there is a worry that some code somewhere assumed that spaces would already be consumed). This implies that we never useMacroExpander.get
with atrue
argument, which could simplify the code of bothget
andunget
. It also means thatswitchMode
no longer does anything useful. Is it worth removing any of this code? I'm unsure.