Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented `\href' command #923

Merged
merged 15 commits into from
Nov 24, 2017
Merged

Implemented `\href' command #923

merged 15 commits into from
Nov 24, 2017

Conversation

konn
Copy link
Collaborator

@konn konn commented Oct 9, 2017

Implemented \href command in LaTeX hyperref package.
It seems that LaTeX simply ignores special commands for hyper-linking and processes spacing as usual.
Since I don't make out how to simulate this spacing behaviour in KaTeX, I adopt the following strategy:

  1. If it has the only element, then use its math-type;
  2. if it has more than two elements, and the types of both ends of elements coincides, then use it as a class for the anchor tag;
  3. otherwise, we inject empty span-tag aside both ends of generated anchor tag, both with the same classes as the first and the last ones respectively.

@khanbot
Copy link

khanbot commented Oct 9, 2017

CLA signature looks good 👍

@edemaine
Copy link
Member

edemaine commented Oct 9, 2017

Thanks @konn for working on this. \href is definitely something we should support.

I don't understand what you mean by your three rules though. \href should not be setting class, should it? Isn't that the purpose of \class (see #90)?

Isn't the second argument of \href treated as if it's in \text? You should be able to use code similar to \text, \raisebox, etc. which switch to text mode.

@konn
Copy link
Collaborator Author

konn commented Oct 10, 2017

I don't understand what you mean by your three rules though. \href should not be setting class, should it? Isn't that the purpose of \class (see #90)?

Well, if you don't specify any math-type via class of an anchor tag, you get the different spacing behaviour than latex's default.

Consider the following input:

\documentclass[a4]{article}
\usepackage{hyperref}
\usepackage{amsmath,amssymb}
\begin{document}
$0 \href{http://en.wikipedia.org/wiki/Equals\_sign}{=} 1$.

$0 = 1$.
\end{document}

If you typeset the above by xelatex, lualatex (or latex + dvipdfmx with dvipdfmx argument for hyperref package), you get the following result, with the upper equals-sign linked to Wikipedia:

default

Here, we see that both \href'ed version of = and plain = result in the same spacing.
On the other hand, if we drop the math-type handling in KaTeX, we get the following result:

2017-10-10 17 43 43

So, to get the same spacing adjustment, we have to set a class of anchor tag to mord, mrel or so.

Isn't the second argument of \href treated as if it's in \text? You should be able to use code similar to \text, \raisebox, etc. which switch to text mode.

As above, LaTeX's (more precisely, hyperref's) \href commands accept a math as its second argument, not only text. So we can't treat them as if they're in \text.

@edemaine
Copy link
Member

Ah, now I understand where you're getting the classes. (I was worried that they were being specified by the user, but rather they're grabbed from the rendered arguments.) Now this seems like the right approach to me. Cool!

src/domTree.js Outdated
// Add the href
markup += " anchor=\"";
markup += utils.escape(this.href);
markup += "\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you're following the existing code style, but you can use template strings to simplify things a bit:

markup += ` anchor="${utils.escape(this.href)}"`;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're adding href to attributes in the constructor shouldn't the loop below that adds all other attributes also add this one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're adding href to attributes in the constructor shouldn't the loop below that adds all other attributes also add this one?

Ah, yes, that's right.
And I mistyped anchor instead of href above, and this mistakenly prevents the duplication of href attribute. I must fix this.

this.depth = 0;
this.maxFontSize = 0;
this.style = {};
this.attributes = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize how much boilerplate there was. We may want to add a base class for these DOM classes at some point in the future. No action required for this PR.

src/domTree.js Outdated
// Add the markup of the children, also as markup
for (let i = 0; i < this.children.length; i++) {
markup += this.children[i].toMarkup();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also support for-of loops which might be simpler:

for (const child of this.children) {
    markup += child.toMarkup();
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! I will adopt it.

let styles = "";

// Add the styles, after hyphenation
for (const style in this.style) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we're duplicating this as well as the adding of attributes. I've created a separate issue for it: #932.

@@ -0,0 +1,81 @@
// @flow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for using the new pattern. Let me know if you had any issues with it.


let cls = []; // null if the type of both ends differs.
let fst; // mathtype of the first child
let lst; // mathtype of the last child
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to keep to short... we minify the code so it should make these even shorter. Maybe classes, first, and last.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry. This naming convention is one of the bad habit of Haskellers... I will make them more readable.

let cls = []; // null if the type of both ends differs.
let fst; // mathtype of the first child
let lst; // mathtype of the last child
// Invariants: both fst and lst must be non-null if cls is null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out. It looks like getTypeOfDomTree can return null so it seems like it's possible that fst could be null and lst could be non-null and cls could end up being null which would break the invariant. Maybe we should have code that checks for that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice that. In that case. I will add the code to handle that case and default to mord.

const elts = [new buildCommon.makeSpan([fst], [], options),
anc,
new buildCommon.makeSpan([lst], [], options),
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor nit, the following formatting might better reflect the new span sandwich we got going on here:

return new buildCommon.makeFragments([
    new buildCommon.makeSpan([fst], [], options),
    anchor,
    new buildCommon.makeSpan([lst], [], options),
]);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much more appropriate. I will reflect it.

names: ["\\href"],
props: {
numArgs: 2,
argTypes: ["original", "original"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe be using "text" for the href so that if we're in math mode and we include a URL with spaces in it, the spaces don't get eaten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #912, I don't think the space behavior will be different, but "text" does feel slightly more correct. I'm guessing, in LaTeX, the argument is treated like a verbatim. For example, are two spaces preserved as two spaces or merged into one? (In KaTeX, they will currently be merged.) Does \foo bar have a space (I guess so) or remove the space (as it will probably currently behave). This might require a new argument type of "url"...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the first argument of \href will be treated as if it's inside verbatim, but both \href{http://en.wikipedia.org/wiki/Equals\_sign}{=} and \href{http://en.wikipedia.org/wiki/Equals_sign}{=} generate the same link to Wikipedia.

In the current implementation, it rejects \href{http://en.wikipedia.org/wiki/Equals_sign}{=} and accepts \href{http://en.wikipedia.org/wiki/Equals\_sign}{=}, but it links to http://en.wikipedia.org/wiki/Equals/_sign instead of http://en.wikipedia.org/wiki/Equals_sign. So it might be appropriate to create new argument type url for proper parsing, as @edemaine said.

* Added `functions/href.js`.
* Added `domTree.anchor` and `buildCommon.makeAnchor` functions.
* Make `buildHTML.getTypeOfDomTree` exported.
* Create new argType "url" to treat link string in math appropriately
* Stop using too shorten variable names
* Start using template strings
@edemaine
Copy link
Member

@konn Just checking: Is this ready for review again?

@konn
Copy link
Collaborator Author

konn commented Oct 19, 2017

@edemaine Yes. I think it’s ready.

@kevinbarabash
Copy link
Member

I'll have another look this weekend.

@kevinbarabash kevinbarabash self-assigned this Oct 19, 2017
Copy link
Member

@kevinbarabash kevinbarabash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also use some unit tests to verify that \hrefs parse and don't parse in the expected way, see katex-spec.js. Also, it'd be nice to have a snapshot test for the MathML that we're outputting, see mathml-spec.js (running npm test -- -u will update the snapshot file).

src/Parser.js Outdated
// hyperref package allows backslashes alone in href, but doesn't generate
// valid links in such cases; we interpret this as "undefiend" behaviour,
// and keep them as-is. In some environment, they're replaced by slashes
// in url by browser.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the backslashes being replaced by forward slashes? In which environments does this happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in Safari 11.0.1 (13604.3.5) on macOS High Sierra 10.13.1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I was getting confused between LaTeX environments and browser environments. Maybe reword that last sentence to be: Some browser will replace backslashes with forward slashes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, it was really confusing expression. I will replace them.

* 2. if it has more than two elements, and the classes
* of its first and last elements coincide, then use it;
* 3. otherwise, we will inject an empty <span>s at both ends,
* with the same classes of both ends of elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaky. In this third case does the element in between the empty spans have no class on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I should elaborate on it. The third case we insert a kind of glue, with the first one having the same class as the first element of body and the second one the same as the last. I will add this description.

}
if (!classes) {
const anc = buildCommon.makeAnchor(href, [], elements, options);
return new buildCommon.makeFragment([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but I'd probably move this up to where you set classes = null so that we can remove an extra if statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. It makes program much simpler. Thanks.

classes = null;
}
} else { // No elements at all, just ignore.
classes = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classes is already [] from let classes = [] so this else block isn't necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I will drop this clause.

src/domTree.js Outdated
if (this.classes.length) {
markup += " class=\"";
markup += utils.escape(createClass(this.classes));
markup += "\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified to

markup += ` class="${utils.escape(createClass(this.classes))}"`;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Template literal looks nicer!

src/domTree.js Outdated
Object.prototype.hasOwnProperty.call(this.attributes, attr)) {
markup += " " + attr + "=\"";
markup += utils.escape(this.attributes[attr]);
markup += "\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markup += ` ${attr}="${utils.escape(this.attributes[attr])}"`

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will adopt template literal also here.

@kevinbarabash
Copy link
Member

@konn sorry about the delay. Please let me know if you have any questions/concerns about any of the feedback.

@konn
Copy link
Collaborator Author

konn commented Nov 12, 2017

@kevinbarabash No problem, thank you for your review!

This could also use some unit tests to verify that \hrefs parse and don't parse in the expected way, see katex-spec.js. Also, it'd be nice to have a snapshot test for the MathML that we're outputting, see mathml-spec.js (running npm test -- -u will update the snapshot file).

OK, I will try to write some unit-tests.

@kevinbarabash
Copy link
Member

OK, I will try to write some unit-tests.

@konn let me know if you need help with this.

@konn
Copy link
Collaborator Author

konn commented Nov 12, 2017

OK, I've just added unit-tests and snapshot tests. I also found that the previous version doesn't treat nested braces correctly, so I fixed this issue.

const url = "http://example.org/~bar/#top?foo=$}foo{&bar=bar^r_boo%20baz";
const input = url.replace(/([#$%&~_^{}])/g, '\\$1');
const ae = getParsed(`\\href{${input}}{\\alpha}`)[0];
expect(ae.value.href).toBe(url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

expect(getMathML("\\href{http://example.org}{\\alpha}")).toMatchSnapshot();
expect(getMathML("p \\Vdash \\beta \\href{http://example.org}{+ \\alpha} \\times \\gamma"))
.toMatchSnapshot();
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that you could have two snapshots inside the same it block, cool!

Copy link
Member

@kevinbarabash kevinbarabash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to have a second look at parseStringGroupWithBalancedBraces when I'm fresh before merging.

@konn
Copy link
Collaborator Author

konn commented Nov 22, 2017

@kevinbarabash Is there any time to look into this?

}
this.mode = outerMode;
this.expect(optional ? "]" : "}");
return firstToken.range(lastToken, str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why we're tracking lastToken now and why a recursive solution isn't as easy as I thought it might be.

@kevinbarabash
Copy link
Member

@konn I've reviewed parseStringGroupWithBalancedBraces, looks good. I will merge this after the tests pass. Sorry for taking so long with this.

@kevinbarabash kevinbarabash merged commit fd82c4f into KaTeX:master Nov 24, 2017
@konn
Copy link
Collaborator Author

konn commented Nov 24, 2017

Thanks!

@konn konn deleted the href branch November 24, 2017 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants