Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add infix operators for relational algebra #8036

Merged
merged 3 commits into from
Oct 22, 2015
Merged

Conversation

jiahao
Copy link
Member

@jiahao jiahao commented Aug 17, 2014

This PR includes yet moar Unicode; this time, primarily to support relational algebra.

  1. Allows all Unicode characters in the geometric shapes block as valid characters.
  2. Defines joins (⨝ ⟕ ⟖ ⟗) and antijoin (▷) as valid infix operators. (Note that ▷ is semantically a shape, but I couldn't find a separate antijoin character.)

@johnmyleswhite
Copy link
Member

I admire your steadfast commitment to exploring the final frontiers of Unicode, @jiahao.

@jiahao
Copy link
Member Author

jiahao commented Aug 17, 2014

These are the voyages of the flagship Julia. Its five-year mission: to explore strange new symbols, to seek out new codepoints and new characters, to boldly go where no language has gone before.

@jiahao jiahao added unicode and removed unicode labels Aug 17, 2014
@StefanKarpinski
Copy link
Sponsor Member

I'm loling on a bus over this exchange. I do like all these Unicode symbols.

@elextr
Copy link

elextr commented Aug 18, 2014

The Julia language boldly exploring new frontiers of untypeability ;)

@StefanKarpinski
Copy link
Sponsor Member

Except that we've simultaneously expanded the frontiers of typeability.

@jiahao
Copy link
Member Author

jiahao commented Aug 18, 2014

That's actually a good point. How do people feel about \join, \leftjoin, \rightjoin, \leftrightjoin and \whitetriangleright as LaTeX-like tab-completion sequences?

(Note that \bowtie tab-completes to , the bowtie character (U+22C8); not to be confused with the aforementioned join operator (U+2A1D), . The former is still left as an invalid identifier character in Julia. It's amusing that it is possible to tab-complete an invalid character... @stevengj)

@stevengj
Copy link
Member

Shouldn't join have the same precedence as union, i.e. have + precedence?

The standard LaTeX name seems to be \Join, but probably it is just uppercase because \join is used for something else in TeX? I'd rather have \antijoin than \whitetriangleright if we are going to use \join.

Since the tab completion is mostly autogenerated from unicode.xml, it's inevitable that it won't overlap exactly with the set of allowed identifier chars; I don't see this as a problem.

@stevengj
Copy link
Member

Aren't the unimath names for these symbols \Join, \leftouterjoin, \rightouterjoin, \fullouterjoin, \triangleleft, \triangleright, etcetera? I'd prefer to stick with these names rather than choosing our own.

In general, we might think about importing unicode-math-table.tex. See #8044.

@StefanKarpinski
Copy link
Sponsor Member

+1 for using unimath names. Except why is \Join the only one capitalized?

@jiahao
Copy link
Member Author

jiahao commented Aug 21, 2014

I wasn't aware of unimath; that makes things easy.

I've updated this PR with a better choice of precedence class (multiplication; table joins are essentially equivalent to matrix products)

@stevengj
Copy link
Member

Can you also update the LaTeX table by re-running the unimath script in base/latex_symbols.jl?

@@ -72,6 +71,7 @@ static int is_wc_cat_id_start(uint32_t wc, utf8proc_propval_t cat)
(wc >= 0x2a00 && wc <= 0x2a06) || // ⨀, ⨁, ⨂, ⨃, ⨄, ⨅, ⨆
(wc >= 0x2a09 && wc <= 0x2a16) || // ⨉, ⨊, ⨋, ⨌, ⨍, ⨎, ⨏, ⨐, ⨑, ⨒, ⨓, ⨔, ⨕, ⨖
wc == 0x2a1b || wc == 0x2a1c)))) || // ⨛, ⨜
wc == 0x2a1d || (wc >= 0x27d5 && wc <= 0x27d7) || //joins: ⨝ ⟕ ⟖ ⟗
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If joins are infix operators, then they don't also need to go in is_cat_id_start. Normally we put them in one place or the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump @jiahao.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump. spaces, not tabs too

@jiahao
Copy link
Member Author

jiahao commented Sep 21, 2014

Rebased and updated with all comments.

I combined and partially rewrote the scripts in the comment block of latex_symbols.jl so that only a single script needed to be run.

@nolta
Copy link
Member

nolta commented Sep 22, 2014

Is the travis failure related?

@stevengj
Copy link
Member

Are some symbols gone? e.g. I can't find \alpha in this list; I think it got replaced with \upalpha, which seems non-ideal.

@JeffBezanson
Copy link
Sponsor Member

Bump. I think we need \alpha!!

@jiahao
Copy link
Member Author

jiahao commented Oct 20, 2015

Rebased and updated.

@@ -99,6 +99,9 @@ static int is_wc_cat_id_start(uint32_t wc, utf8proc_propval_t cat)
(wc >= 0x2220 && wc <= 0x2222) || // ∠, ∡, ∢
(wc >= 0x299b && wc <= 0x29af) || // ⦛, ⦜, ⦝, ⦞, ⦟, ⦠, ⦡, ⦢, ⦣, ⦤, ⦥, ⦦, ⦧, ⦨, ⦩, ⦪, ⦫, ⦬, ⦭, ⦮, ⦯

// geometric shapes
(wc >= 0x25a0 && wc <= 0x25ff) ||

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't many of these already covered by the cat == UTF8PROC_CATEGORY_SO check? It seems like the only ones in category Sm (which have to be special-cased) are U+25F8 to U+25FF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's useful to do the minimal amount of special-casing here for people in other contexts trying to write Julia lexers with regexes etc., e.g. for syntax highlighting in editors.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I just tried some simple function definitions without this commit and it works. I'll take it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you don't need it at all, because there is already wc >= 0x25F8 && wc <= 0x25ff above in addition to UTF8PROC_CATEGORY_SO.

"\\rightouterjoin" => "⟖", # right outer join
"\\fullouterjoin" => "⟗", # full outer join
"\\Join" => "⨝", # join
"\\mathunderbar" => "̲", # combining low line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just \underbar? math is kind of redundant here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

math is kind of redundant here.

True; that was the name in unicode-math-table.tex. I can change it manually.

@jiahao
Copy link
Member Author

jiahao commented Oct 21, 2015

I've modified the latex symbol parsing script to strip math- out of character names in unicode-math-table.tex.

Note: ▷ is semantically a geometric shape, but I could not find a separate character for antijoin
@stevengj
Copy link
Member

LGTM once commits are squashed and tests are green.

- Minor clean up of latex_symbols generator script
- Update list of latex symbols
jiahao added a commit that referenced this pull request Oct 22, 2015
Add infix operators for relational algebra
@jiahao jiahao merged commit e372b06 into master Oct 22, 2015
@jiahao jiahao deleted the cjh/relational-algebra branch October 22, 2015 02:57
@AzamatB
Copy link
Contributor

AzamatB commented Jan 15, 2016

A bit confused, does it mean that these symbols are now defined as relational algebra operators and can be used to perform joins in Julia (since relational algebra was brought up), or we are only talking about including these symbols into set of valid Julia characters?

@tkelman
Copy link
Contributor

tkelman commented Jan 15, 2016

These are recognized by the parser as valid infix operators, but no default implementation of them is given in base. Packages and user code are free to do so. (Though some coordination is called for if packages are defining methods for them on Base types.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants