Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept all existing Greek letters using unicode characters in math mode #410

Merged
merged 1 commit into from
Aug 14, 2017

Conversation

kevinbarabash
Copy link
Member

Test Plan:

  • make test
  • run screenshot tests on travis-ci

src/Lexer.js Outdated
@@ -46,6 +46,7 @@ function Token(text, data, position) {
var tokenRegex = new RegExp(
"([ \r\n\t]+)|(" + // whitespace
"---?" + // special combinations
"|[\u0391-\u03A9\u03B1-\u03C9]" + // Greek letters
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\u03D6 is \varpi ϖ, \u03D5 is \phi ϕ, \u03F5 is \epsilon ϵ. KaTeX outputs all these (try \varpi \phi \epsilon and look at the Unicode code points of the output); do you want to include them?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And \u03D1 is \vartheta ϑ, \u03F1 is \varrho ϱ, also supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May as well add them since we have those glyphs. Good catch.

@ronkok
Copy link
Collaborator

ronkok commented Dec 26, 2015

Similarly, U+03B5 ↦ \varepsilon ε, U+03F0 ↦ \varkappa ϰ, U+03C2 ↦ \varsigma ς, U+03C6 ↦ \varphi φ. KaTeX supports all of them.

I would include \digamma, but there is an issue to resolve first.

The AMS (and KaTeX) function \digamma looks like U+03DC, capital letter Ϝ. But the unicode-math and Teubner packages map \digamma to U+03DD, small letter ϝ. It is \Digamma that they map to U+03DC, capital letter Ϝ.

It’s a problem. The unicode-math function names are consistent with the naming convention for other Greek letters. But AMS is more popular. I don’t know the best way to resolve the collision.

@kevinbarabash
Copy link
Member Author

@ronkok your comment made me realize that I hadn't updated the token regex. I have added the glyphs for \varepsilon, \varsigma, and \varphi. I will add \varkappa and the AMS version of \digamma. I don't believe any of our fonts includes a glyph for the small letter ϝ.

@wchargin
Copy link

@ronkok Good catch about U+03F0 \varkappa, but the other three (U+03B5, U+03C2,
U+03C6) are handled by [\u03B1-\u03C9], right? (The ε, ς, and φ characters are the "normal" characters output by pressing e, w (for "word-final sigma"), or f (respectively) on a Greek keyboard.) Do they need any additional special casing?

@kevinbarabash
Copy link
Member Author

After thinking of about the \digamma issue, I'm going to punt on it for now because at some point in the future we probably want to support more of the unicode-math package and I'd like to avoid changing the behavior.

@ronkok
Copy link
Collaborator

ronkok commented Dec 27, 2015

@wchargin, you are correct about U+03B5, U+03C2, and U+03C6. They do not require any special casing. My bad.

@kevinbarabash, Good call. It's okay to go slow. I don't hear the world calling out for a expedited decision on \digamma.

@kevinbarabash
Copy link
Member Author

@wchargin I misread my own regex. This looks like it's good to go.

@wchargin
Copy link

Haha, okay :) The regex seems good to me except for the bizarre case of U+03A2, which is not a valid Unicode character for some reason. This seemed weird to me, but I confirmed it in my official Unicode Consortium book and it is correct. For reference, here's the full Greek table:

greek_table

(The next page, which has all the codepoints' official names and decompositions, lists it as "reserved.")

Other than that, the characters that match the regex are ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨαβγδεζηθικλμνξοπρςστυφχψ, which seems fine to me.

# Python 3
''.join(chr(x) for x in list(range(0x0391, 0x03A9)) + list(range(0x03B1, 0x03C9)))

(I'm definitely not qualified to review the actual KaTeX aspect of it though, sorry 😕 )

@kevinbarabash
Copy link
Member Author

If it's reserved it's probably going to be hard for a person to type it in. There are some characters (capitals letters) which we don't have glyphs for. I'll see what happens when they're typed in and report back.

@kevinbarabash
Copy link
Member Author

It throws on all of the Greek glyphs that we don't have, e.g. Α, Β, Ε, etc. which I think is acceptable.

@wchargin
Copy link

Yeah, that's a tricky one. It always bothered me that TeX has no \Alpha command (whatever happened to semantics matter?). I mean, the two obvious choices that I see are disallow entering Α (U+0391 GREEK CAPITAL LETTER ALPHA) or alias it to A (U+0041 LATIN CAPITAL LETTER A). If we're ever planning to add support for user-defined fonts, for which the two glyphs could be distinct, that might make a difference.

In the spirit of forward-compatibility, your decision to disallow them for now sounds good to me; you can always add them later, but they'd be harder to remove. @xymostech ?

@ronkok
Copy link
Collaborator

ronkok commented Jan 2, 2016

Should \mathbf{Ω} render the same as \mathbf{\Omega}? Would a test of that be worthwhile?

@kevinbarabash
Copy link
Member Author

@ronkok definitely worth a test. I'll see to it Monday.

@xymostech
Copy link
Contributor

@kevinbarabash The code of this looks good! Do you want to write the test that @ronkok suggested? Also, what happens when you put these in \text{}?

@kevinbarabash
Copy link
Member Author

@xymostech will do. I totally forgot about that test. Thanks for reminding me.

@kevinbarabash
Copy link
Member Author

@xymostech I finally got around to testing \text{} and it blew up b/c I hadn't defined the symbols for text mode. I've updated the diff to include text mode versions of all of these symbols.

@kevinbarabash
Copy link
Member Author

kevinbarabash commented Jan 15, 2017

@ronkok sorry for the delay. I just tested \mathbf{Ω}\mathbf{\Omega} and they render the same. I'm going to update the screenshot test to include this.

@edemaine
Copy link
Member

Just bumped into this older PR. Looks like everything was good to go, and just needs a rebase and review?

@kevinbarabash kevinbarabash requested review from edemaine and removed request for xymostech June 16, 2017 13:04
@kevinbarabash
Copy link
Member Author

I'll rebase this this evening.

@edemaine
Copy link
Member

I did more reading of the various unicode PRs. I'm a little confused about what the latest/best approach is, but it does make me wonder whether this one will be necessary... Intuitively, defineSymbol ought to define both the unicode and \command version of a symbol. But the other work shows that the font's notion of which unicode symbol it is is not always what we want.

Perhaps we should add another option argument to defineSymbol that gives the unicode character that should be recognized as this one? And then go through symbols by hand? Greek could still be the initial testing ground.

defineSymbol(math, main, mathord, "\u03bc", "\\mu", true);
defineSymbol(math, main, mathord, "\u03bd", "\\nu", true);
defineSymbol(math, main, mathord, "\u03be", "\\xi", true);
defineSymbol(math, main, mathord, "\u03bf", "\\omicron", true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the omicron glyph in the our fonts so we may as well use it.


if (acceptUnicodeChar) {
module.exports[mode][replace] = module.exports[mode][name];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this new approach! Avoids repetition of the Unicode symbol, and makes the decision of "include this unicode character" clear. I think if we ever want a unicode symbol to point to one not matching the font, then we can manually define that symbol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should extend pretty easily to other unicode characters that are simple matches.

@edemaine
Copy link
Member

It looks like there's an error with the screenshot. Can you regenerate?

Test Plan:
- make test
- run screenshot tests on travis-ci

Reviewers: emily
@kevinbarabash kevinbarabash merged commit e00738d into master Aug 14, 2017
@kevinbarabash kevinbarabash deleted the greek_unicode_support branch August 23, 2017 01:47
@ronkok ronkok mentioned this pull request Oct 14, 2017
kevinbarabash pushed a commit that referenced this pull request Oct 17, 2017
* Support Unicode relations

This is the first in a series of PRs to give KaTeX the ability to recognize Unicode character input. The code in this PR follows the style of PR #410.
All the characters in this PR will produce rel atoms. I’ll submit PRs for other atom types later.

* Fix lint error.

* Correct mapping errors

This commit fixes a brain cramp of mine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants