Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced macro support and magic \dots #794

Merged
merged 10 commits into from
Sep 5, 2017
Merged

Conversation

edemaine
Copy link
Member

@edemaine edemaine commented Aug 13, 2017

This is a re-implementation of #599 (\dots support) but extending the MacroExpander as necessary to build it as a macro instead of a function, following @gagern's outline.

Along the way, we get some nice new macro features:

  1. Macros (e.g. given by defineMacro) can now be JavaScript functions, with this bound to the MacroExpander. (Note the slightly different interface from functions, which get a complex context object -- up for discussion!) This lets us extend the MacroExpander in more powerful ways, and use the following features:
  2. future() returns the next token without expansion (similar to \futurelet but without the "let" aspect of assigning to a variation)
  3. expandAfterFuture() returns the next token after one level of expansion (equivalent to a careful use of \expandafter and \futurelet)
  4. \relax is now implemented! It stops expansion, but doesn't actually get returned from the MacroExpander.

We also get fixes to old KaTeX bugs (with new tests to confirm, some of which used to fail):

  1. \text{\textellipsis !} used to include a space in the middle. The trouble is that we were only eating spaces when in math mode, but in text mode we need to be more careful to eat the spaces that make up part of the command name. This is now consumed during macro expansion, even though it's not technically a macro expansion (\textellipsis is an unexpandable character).
  2. Ditto for \text{\foo } where \foo is defined to something -- we got a space, but shouldn't have.
  3. Macros did not handle command arguments correctly. \def\foo#1{(#1)}\foo\bar should expand to (\bar) and then expand \bar, whereas the old macro expander would first expand \bar and then use its tokens for arguments. For example, this didn't work if \def\bar{} or \def\bar{ } (especially the latter because of bad space ignoring). Also, if \foo is a multiargument function, \bar should constitute one argument instead of one per token. (I did a bunch of testing in LaTeX to confirm this behavior.)

A few other miscellaneous changes:

  • Lexer.js optionally exports Token, so that I can do instanceof tests.
  • consumeSpaces() irreversibly consumes space tokens. This is used to fix the first bug listed above.
  • nextToken() got split into expandOnce() and expandNextToken(). So, e.g., expandAfterFuture() is equivalent to expandOnce() and then future(). (This makes for some awkward diffs -- sorry!)
  • The cdots symbol is now \cdots@, because amsmath defines a more complex \cdots in terms of this symbol.
  • @ is now considered a valid character for commands, as if \makeatletter is in effect. I don't think we currently define \@ but if we did this change would be annoying.
  • A rewrite of Implement AMSMath's \dots #599 to handle \dots, \cdots, etc. correctly.

Here's a texcmp confirming I got at least the tested cases right:

dots

@kevinbarabash
Copy link
Member

@edemaine this is awesome. I don't have a very good understanding of macros so hopefully @gagern and others can help out with the review.

@@ -75,14 +75,14 @@ const tokenRegex = new RegExp(
"([ \r\n\t]+)|" + // whitespace
"([!-\\[\\]-\u2027\u202A-\uD7FF\uF900-\uFFFF]" + // single codepoint
"|[\uD800-\uDBFF][\uDC00-\uDFFF]" + // surrogate pair
"|\\\\(?:[a-zA-Z]+|[^\uD800-\uDFFF])" + // function name
"|\\\\(?:[a-zA-Z@]+|[^\uD800-\uDFFF])" + // function name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be dangerous. @ is not actually a valid character for macro nams in default TeX. It only works after \makeatletter, and is intended for internal macros that users don't type.

Overall it is probably safe to include @ by default, or wait for complaints before trying a complete fix; the \@ macro is rare in math.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to have at least one negative reaction -- I was curious what others would think. I see three alternatives:

  1. Add a Lexer option for whether to include @ as a letter. Then we could actually support \makeatletter and \makeatother (presumably this would need to be carried in the Settings object...).
  2. Have the cdots macro return manually parsed Tokens instead of strings. Then we don't need to touch the Lexer.
  3. Rename \@cdots to something else like \cdotsINTERNAL or \latexcdots.

Thoughts/preferences?

Copy link
Member Author

@edemaine edemaine Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, the regex change won't affect @{...} support in tabular (the only use I know of @ as an active char), nor will it affect the most typical usage of \@, namely, \@. (not that KaTeX currently supports \@). What it would break is if you used \@ immediately followed by a letter, which ... I can't imagine doing. So the other option is to leave this change as is for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is where I came down and for a similar reason: keep @ as is in this commit.

// \let\DOTSX\relax
defineMacro("\\DOTSI", "\\relax");
defineMacro("\\DOTSB", "\\relax");
defineMacro("\\DOTSX", "\\relax");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

????

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amsmath defines these "null macros" for other macros to indicate the behavior of \dots preceding them. Indeed, I should have added use of these macros to \iff, \implies, and \impliedby. See new commit.

@kevinbarabash
Copy link
Member

I'm going to try to get this reviewed over the next couple of days.

@edemaine
Copy link
Member Author

Thanks @kevinbarabash! I just rebased to fix a conflict with the CJK tweak on \dots commands.


it("should consume spaces after macro", function() {
compareParseTree("\\text{\\foo }", "\\text{x}", {"\\foo": "x"});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests. I like how these call out differences in behavior between math mode and text mode.

compareParseTree("\\text{\\foo 1 2}", "\\text{12end}", {"\\foo": "#1#2end"});
compareParseTree("\\text{\\foo {1} {2}}", "\\text{12end}", {"\\foo": "#1#2end"});
});

it("should allow for multiple expansion", function() {
compareParseTree("1\\foo2", "1aa2", {
"\\foo": "\\bar\\bar",
"\\bar": "a",
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR, but it might be good to have some tests that verify things about multiple expansions involving arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fun, I added such a test in 56d8900

* Recursively expand first token, then return first non-expandable token.
* Return the next unexpanded token without removing anything from the
* stack. Similar in behavior to TeX's `\futurelet`.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the term unexpanded token a little confusing, because it's possible that future gets called sometime after expandOnce gets called in which case there may be tokens on the stack that were the result of expanding a token. Maybe stating this in the following way:

Returns the topmost token on the stack, without expanding it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice rewrite. Implemented.


/**
* Expand next token only once, and leave it on the stack.
* Returns the token or its expansion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could be a bit clearer, maybe:

Expand the next token only once if possible.

If the token is expanded, the resulting tokens will be push
on to the stack in reverse order and will be return in as an
array, also in reverse order.

If not, the next token will be return without removing it from
the stack.

As a result, as long as there are tokens on the stack, the next 
token is at the top of the stack.

Feel free to wordsmith this more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised accordingly.

expandOnce() {
const topToken = this.popToken();
const name = topToken.text;
const macro = (name.charAt(0) === "\\");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macro => isMacro to more clearly indicate that it's a boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

let expansion = this.macros[name];
if (typeof expansion === "function") {
expansion = expansion.call(this);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expansion is a function then it must always return a string with valid TeX code which can contain argument placeholders, is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it could also return an Array that's a reverse-ordered list of tokens. This was functionality from before that I'm preserving, though I'm not sure we'd want to use it exactly that way. (In particular, the reverse-order aspect is weird.) But it is a nice way for a macro to be able to force strange parsing.

if (!(macro && this.macros.hasOwnProperty(name))) {
// Fully expanded
this.stack.push(topToken);
return topToken;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little weird to pop topToken at the start just to push it back on. I can't think of demonstrably better way to do this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, unfortunately on both counts. If I didn't have to consumeSpaces, I could use future() and then popToken after the fully expanded case. Maybe there's another place to put the consumeSpaces... but at least this works.

/**
* Expand the next token once (ignoring initial spaces like `get`),
* without removing anything from the stack, and return the top token
* on the stack. Similar in behavior to TeX's `\expandafter\futurelet`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it ignore initial spaces if it calls expandOnce right away which only ignores trailing spaces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was out-of-date. Updated.

// expandOnce returns Token if and only if it's fully expanded.
if (expanded instanceof Token) {
// \relax stops the expansion, but shouldn't get returned (a
// null return value couldn't get implemented as a function).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what a null return value couldn't get implemented as a function means. Could you give an example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions (as defined by defineFunction) return a ParseNode or an object that gets converted into a ParseNode by the parser. It's not possible for a function to return null that would just disappear in the final parse tree. Hence it would be impossible to implement a \relax that expands to nothing (which is different to how {} expands).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we return null the parser would think that it's an object an convert it to a ParseNode. I wonder when would we ever want to write a function that returns a null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long-term, I vaguely hope we can transition functions over to macros, as that's what TeX really does. So maybe not too critical... though certainly could be done by tweaking group parsing.

src/macros.js Outdated
@@ -3,6 +3,9 @@
* This can be used to define some commands in terms of others.
*/

import symbols from "./symbols";
import utils from "./utils";

// This function might one day accept additional argument and do more things.
function defineMacro(name, body) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we get flow in place we'll be able to type this as:

function defineMacro(name: string, body: string | () => string) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@edemaine
Copy link
Member Author

edemaine commented Sep 4, 2017

Sorry for the delay. Finally went through and implemented all the comments. Thanks for the review!!

Incidentally, this PR also fixes a bug in master with \text{\v x}. Currently, because of incorrect space handling, this renders as an accent (over a space) followed by an x. With the PR, it renders correctly (same as \text{\v{x}} does now).

// This function might one day accept additional argument and do more things.
function defineMacro(name, body) {
function defineMacro(name: string, body: string | () => string) {
Copy link
Collaborator

@marcianx marcianx Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Mind adding // @flow at the top of this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, indeed! Clearly still a flow noob...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too, for all that's worth. ;)

@kevinbarabash kevinbarabash merged commit 6857689 into KaTeX:master Sep 5, 2017
@ronkok ronkok mentioned this pull request Sep 6, 2017
@kevinbarabash kevinbarabash mentioned this pull request Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants