Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TeX for the web #5

Open
davidar opened this issue Oct 3, 2015 · 31 comments
Open

TeX for the web #5

davidar opened this issue Oct 3, 2015 · 31 comments
Assignees

Comments

@davidar
Copy link
Member

davidar commented Oct 3, 2015

It has long been possible to convert TeX to HTML (#1). However, I think it's fair to say that the results are often hideous, as web browsers (by default) suck at typesetting compared to TeX. Fortunately, it is now possible to work around some of these deficiencies with JS and CSS, which I've tied together in this demo https://davidar.io/TeX.js/ ( https://github.com/davidar/TeX.js )

The aim of this is to achieve (an approximation to) the professional quality of TeX typesetting, whilst integrating with the web and optimising for on-screen viewing better than a PDF viewer can.

@rht
Copy link

rht commented Oct 3, 2015

Past attempt in firefox, https://bugzilla.mozilla.org/show_bug.cgi?id=630181

@rht
Copy link

rht commented Oct 3, 2015

@bramstein

@davidar
Copy link
Member Author

davidar commented Oct 4, 2015

Yes, @bramstein it would be fantastic to have your input on this :)


@rht Yeah, I saw that issue, and was somewhat amused by this comment:

[...] is a huge issue for web browsers, which sometimes have to deal with giant (think tens of megabytes) paragraphs.

@bramstein
Copy link

I think the performance argument is not a very good one. It'll get slow with very large paragraphs, but there are ways around that (splitting the paragraph, falling back to the greedy line breaking algorithm, etc.) The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS. All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

@davidar
Copy link
Member Author

davidar commented Oct 5, 2015

I think the performance argument is not a very good one.

Me either

The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

Frankly I'd be happy with anything more intelligent than the greedy algorithm used by browsers (somewhat disturbingly it seems IE is the only one supporting something like this currently)

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS.

Definitely, I only intend to support the basic subset output by LaTeX-to-HTML conversion tools like tex4ht or LaTeXML

All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

That's good to hear

@bramstein I know you've said that typeset.js is likely to never be production ready, but how much work would it take to make it robust enough to handle the specific use case I'm interested in here? As in, I can drop the script into a basic HTML document, and it Just Works. For context, I'd like to (eventually) be able to produce HTML versions of the articles in the creative commons arxiv subset ( #1 ) that look (almost) as good as the PDFs. It would be great if this included Knuth-Plass line breaking (but I'm not a web developer, so am somewhat limited in what I'm able to achieve myself)

@davidar
Copy link
Member Author

davidar commented Oct 5, 2015

Alright, here's my first approximation (just using greedy justification for now):

@rht
Copy link

rht commented Oct 5, 2015

The bigger issue is that some parts of CSS are incompatible with the TeX model.

If you have an example to pinpoint this incompatibility...
(I don't know much of TeX box/glue plumbing)

It's either full TeX typesetting onto a subset of html/css/js, or parts of TeX on full html/css/js (which e.g. for math, is already well supported).
@bramstein Why do you suggest the former?

If the goal is to better format the tex4ht/latexml out of scientific papers, then the former is preferable.
If the goal is to bring TeX quality typesetting to the web, the latter can be done in piecemeal, https://github.com/w3c/dpub-pagination (why would there be page breaks in a web document?).

Also, mind the format size:

  • pdf: 352KB
  • mhtml: 4.1MB
  • justified mhtml: 4.3MB
  • justified mhtml.tar.bz2: 3.1MB

(This one needs justification as well: https://github.com/worrydream/EarlyHistoryOfSmalltalk)

@rht
Copy link

rht commented Oct 5, 2015

(...what is it like to read originally paged books but without the page breaks helper?)

@davidar
Copy link
Member Author

davidar commented Oct 6, 2015

@rht most of that 4MB is poorly compressed images, which can be improved (eg. using SVG instead of PNG)

Re pagination: I don't think trying to emulate physical books too closely is a good idea, but something definitely needs to be done to improve location memory

Edit: it would be cool if you could leverage something like https://en.m.wikipedia.org/wiki/Method_of_loci for this purpose, eg: gradually changing background colour/pattern/image as you scroll down the page

@davidar
Copy link
Member Author

davidar commented Oct 10, 2015

@rht
Copy link

rht commented Oct 10, 2015

@davidar sorry for the late re, I wonder if it is useful to have a more fine-grained href (paragraph, section), like https://github.com/ipfs/go-ipfs/blob/master/core/bootstrap.go#L4.
The paper itself was uploaded in 2011 http://arxiv.org/pdf/1104.2778v1.pdf.

For the images, there is also https://www.npmjs.com/package/gulp-imagemin.

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'. This stuff is more related to #2.

For the experiment with method of loci, it would have been preferred if the author had incorporated this method from the beginning. Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout. If I were to use one, I'd construct such that the mnemonic is naturally connected to the text e.g. a book about the innards of a ship (thought) & the ship it describes (extension).

@rht
Copy link

rht commented Oct 10, 2015

(papers are often annotated externally, but codes aren't. They are instead referred by range of line number
edit: but CR is annotation)

@jbenet
Copy link
Member

jbenet commented Oct 11, 2015

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

@rht
Copy link

rht commented Oct 11, 2015

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

imported modules in code are not clickable either (unless with sourcegraph).

@rht
Copy link

rht commented Oct 12, 2015

@davidar
Copy link
Member Author

davidar commented Oct 12, 2015

@rht yes, section/paragraph linking is definitely something I'd like to do

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'.

Yeah, I'm not trying to emulate a physical book, but I'd like to remedy some of the deficiencies of on-screen reading in terms of recall, etc.

Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout.

I've experimented with subtly changing the background colour based on scroll position, which seemed to work quite nicely, although it had some technical problems, so i decided to take it out for the moment.

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

@jbenet Yes, that's definitely on my radar, I really hate traditional bibliographies (there's this thing called hyperlinks, people). Of course, you can't hyperlink a dead tree, but who prints stuff these days?

The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.

Cool, although it's not quite as seamless as it could be (e.g. having a citation link directly to the section of the article the author is referencing).

@davidar
Copy link
Member Author

davidar commented Oct 12, 2015

I've broken this into a separate project now: https://davidar.io/TeX.js/

Please submit bugs / feature requests to https://github.com/davidar/TeX.js/issues

@rht
Copy link

rht commented Oct 12, 2015

Since the html page can't be annotated (/PR-ed),
davidar/TeX.js@2268e11#diff-eacf331f0ffc35d4b482f1d15a887d3bR19 (more citation needed)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.
But indeed, there was no hyphenation in retina iOS book reader in 2010, http://www.subtraction.com/2010/06/08/better-screen-same-typography/.

I've experimented with subtly changing the background colour based on scroll position

But again, this is just a mnemonic tool (associating 2 random slightly related facts, much like naming star constellations). Unless the background color is calculated based on the aggregate sentiment of the text in a page/paragraph or something (and there is still risk of fogging the author's intention).

traditional bibliographies (there's this thing called hyperlinks, people)

The recent (in TeX timescale) biblatex package by default displays url if the field exists, but this is the amount of boilerplate code for clickable refs in https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-cap2pfs.tex#L12-L27.

having a citation link directly to the section of the article the author is referencing

The ecosystem doesn't exist yet, but meanwhile, this can be done manually by the author, e.g.

  1. "Git has already influenced distributed filesystem design". The fact is stated in http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf #section3.1sentence1.
  2. "Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily." The fact referred is in https://www.cl.cam.ac.uk/~lw525/publications/P2P2013_13.pdf #sectionIV.Fsentence2.

Similarly, to cite the definition of merkledag in the paper, https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf #section2.3sentence2footnote.

@rht
Copy link

rht commented Oct 12, 2015

hyphenation on the web, 2011, http://blog.fontdeck.com/post/9037028497/hyphens.

@davidar
Copy link
Member Author

davidar commented Oct 12, 2015

Since the html page can't be annotated (/PR-ed)

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.

Yes, I'm having trouble seeing the relevance here though?

But again, this is just a mnemonic tool

Of course. I'm not trying to associate semantically meaningful images to the text, I'm simply trying to improve the ability to recall the position in the text where you read something. The baseline is "I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book", so I'm not aiming for anything more meaningful than that.

biblatex package by default displays url if the field exists

I'm not sure if this is what @jbenet meant, but personally I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations. (Although someone can generate a bibliography from this information if they so desire.)

The ecosystem doesn't exist yet

That's why I'm trying to bootstrap the ecosystem with the arXiv corpus ;)

@rht
Copy link

rht commented Oct 12, 2015

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I mean, the display of the paper (https://davidar.io/TeX.js/) can't be annotated that I can only comment on the source code.

"I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book"

That is still a more precise address than referring to a background color shade.

I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations

Had thought of that when parsing what 'clickable references' means. But wikipedia still does with displaying the references in a section https://en.wikipedia.org/wiki/Bibliography#References.

Edit: s/background color/background color shade/

@davidar
Copy link
Member Author

davidar commented Oct 13, 2015

  1. I plan on integrating https://hypothes.is soon, so stay tuned ;)
  2. Yes, we need to balance precision against recall. The essential feature of recalling location in physical books is a combination of a low frequency (approximate position in book) and high frequency (left right top bottom of page) component. So, perhaps two colours could work better? Note that I'm not taking about communicating locations, but about subconscious recall.
  3. But Wikipedia also has popups when you hover over citations in the text. You can certainly have both, yes.

@davidar
Copy link
Member Author

davidar commented Oct 13, 2015

@rht You should now be able to directly annotate https://davidar.io/TeX.js/ (and any other page using TEX.js) thanks to @hypothesis (cc @RichardLitt @nickstenning) 😄

Note to self: think about integrating @ipfs and @hypothesis (cc @jbenet)

@davidar davidar mentioned this issue Oct 13, 2015
19 tasks
@jbenet
Copy link
Member

jbenet commented Oct 14, 2015

@davidar yes, we should do that. there's much overlap.

cc @tilgovi -- we should put public annotations on ipfs. -- also, once we get capabilities, private ones too

@jbenet
Copy link
Member

jbenet commented Oct 14, 2015

@davidar this works very well, good stuff!

@tilgovi
Copy link

tilgovi commented Oct 27, 2015

Would this be a good repo to open an issue for designing and discussing ipfs comments?

@whyrusleeping
Copy link
Member

@tilgovi i think so

@tilgovi
Copy link

tilgovi commented Oct 27, 2015

Opened #12.

@jbenet
Copy link
Member

jbenet commented Dec 4, 2015

@davidar where was your hypothesis annotated version? not finding it

@davidar
Copy link
Member Author

davidar commented Dec 5, 2015

@jbenet the hypothesis enabled version doesn't seem to have made it into ipfs yet, will add it to my to-do list

@jbenet
Copy link
Member

jbenet commented Jan 25, 2016

cc @BigBlueHat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants