Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate MEM_INIT_METHOD 2 #5401

Closed
wants to merge 1 commit into from

Conversation

buu700
Copy link
Contributor

@buu700 buu700 commented Jul 20, 2017

Pulled out of #5296 for separate discussion. Related comment threads: #5296 (comment), #5296 (comment), #5296 (comment)

CC @kripken, @juj, @gagern

@kripken
Copy link
Member

kripken commented Jul 24, 2017

lgtm to me. Let's leave this up for a few more days to see if there are any concerns, before we merge.

@gagern
Copy link
Contributor

gagern commented Jul 28, 2017

Can you please point out the benefit of the deprecation? That other PR is pretty long, a summary would be appreciated. It is far from obvious to me how the two relate.

Short history recap from my point of view. When I introduced MEM_INIT_METHOD==2 in #3326, my goal was quite similar to #5296: getting all relevant resources into a single file, without causing the excessive overhead of an array literal with all bytes spelled out in decimal. Base64 came to my mind, but the discussion in #2188 had pointed out that it compresses rather poorly, so I looked for alternatives. For a while I had been following an idea I called tinylut, where a lookup table (LUT) establishes a mapping from bytes (or codepoints) in the string literal to bytes in the memory image. The idea was to use printable characters only, but to keep the number of escape sequences low while maintaining the compressibility. There is a sample file containing decode loop and sample input, and the corresponding encoder implementation. However, in #3326 (comment) @chadaustin argued for a simpler decoding logic avoiding that map, so I rewrote my PR to do that, and it got merged. For a while we had issues with the way Windows handled bytes 00 and 1A, but #3854 by @juj fixed that. After that, I kind of lost interest in advocating the thing any further. I had been using it for my own projects, and had assumed it would make its way to established default based on its own merits, but apparently it never did. Likely too few people knew about it.

The way I see it, there are several competing goals here:

  1. Reducing the size of the uncompressed file (as not everyone can serve compressed files)
  2. Reducing the size of the compressed file (was the main metric in More compact memory initialization? #2188)
  3. Keeping the complexity and maintenance burden low
  4. Avoiding issues with non-printable characters

Base64 handles 4 perfectly, and 3 is good if you can delegate decoding to something. Goals 1 and 2 are handled better than with the decimal literal, but not as good as some of the discussed alternatives. In contrast to this, method 2 as it got merged from my PR does 1 and 2 quite nicely, being very close to raw binary representation. But 4 has been a source of problems, and while the CRC32 checksum helps to at least catch them early on, that adds some complexity. As do the various places where certain characters have to get escaped correctly.

I don't mind my code getting deprecated and eventually deleted if that's what's best for the project. But I saw and still see a certain demand for small sizes, which base64 can't deliver that well. I am not sure how much of a problem unprintable characters are these days, with Windows and various versions of various uglification processors. I'm pretty sure they can be made to work fine in controlled environments (likely linux and no closure compiler), and if they do, they are probably preferable. I also believe that any compatibility issues remaining can be solved with moderate amounts of work if the person doing said work has direct access to an environment reproducing the issue.

I'm no fan of having two (or even three) distinct memory initialization methods. It would be great if the current methods 0 and 2 could converge towards something that suits everyone. Personally I have the impression that method 2 might be closer to satisfying all needs, but I might be biased.

If you change method 0 to use base64, then change that to optimize better for size, how much would you expect the outcome of that to differ from method 2, or alternatives like tinylut?

@buu700
Copy link
Contributor Author

buu700 commented Jul 28, 2017

I won't weigh in on what's best of emscripten (leaving that to @kripken and @juj, as I'm impartial), but thanks for the detailed analysis! That was very interesting.

Depending on what's decided here, I would have no problem with either changing my new stuff too use your encoding or adding it as a separate option.

As far as the relationship between my PR and your mem init encoding scheme: in addition to enabling embedding all subresources within the JS, my PR replaces the previous mem init option 0 implementation with a hook into the new SINGLE_FILE logic, which as you noted uses base64. I can't personally speak to any problems that may have been caused by the more space-efficient option 2 encoding, but I guess @kripken must have felt that the efficiency difference between it and base64 was small enough that it was worth considering killing it and maintaining only the one option.

@kripken
Copy link
Member

kripken commented Aug 1, 2017

Good points. From my perspective,

  • The mem init file is becoming less important, since WebAssembly is a binary file anyhow and it contains its own static init section. But embedding files does still matter, so we need something more general.
  • I know MEM_INIT_MODE 2 had some issues - settings.js mentions it doesn't work on windows, @gagern , is that obsolete after the fixes you mention?

The single-file PR seems good both because it uses a simpler method, and that it simplifies the logic for embedding files so it works for all the files we need to embed. Those are two separate benefits, though, in theory we could use the mem init 2 method with it.

So perhaps we should

  • Deprecate mem init mode 2
  • Land single file
  • Evaluate using the mem init mode 2 method in single file

Thoughts?

@buu700
Copy link
Contributor Author

buu700 commented Aug 1, 2017

That sounds like it makes a lot of sense to me. I don't want to spend too much more time on #5296 and it's already a pretty big PR as-is, but someone else sorting out whether and how to swap out the base64 encoding with string literal encoding sounds like a great idea.

@chadaustin
Copy link
Collaborator

When I was there, IMVU was a strong proponent of mem init mode 2 (or really anything that could reduce binary size or parse time), but I don't know if it's in use. I've reached out to some people on the 3D team there to see if they still use it. Also cc @waywardmonkeys

I still think it's a good idea. :) But I can't comment on the maintenance burden.

@waywardmonkeys
Copy link
Contributor

I haven't read the above mentioned threads yet. IMVU hasn't used this method for a long time as it didn't work on Windows and the issue was pretty obscure to track down as being a problem.

@gagern
Copy link
Contributor

gagern commented Aug 5, 2017

I don't know how much of a problem method 2 still is on Windows, since I don't have one. I thought that all the issues I had heard of had gotten resolved at some point, but I can't make any guarantees.

Looking at gagern@5dbd71f58c2fcf933f2d656657ff2300094b0491I get the impression that I had wanted to remove the warning and reactivate the tests. Perhaps I had wanted to test this on Windows before opening a pull request, and then forgot about it before I got an opportunity.

@juj
Copy link
Collaborator

juj commented Aug 7, 2017

Can you please point out the benefit of the deprecation?

I think the desire here was to not have to maintain multiple embedding methods side by side. Though that does not feel too costly in this case, MEM_INIT_MODE=2 does not seem to take up too much code.

I know MEM_INIT_MODE 2 had some issues - settings.js mentions it doesn't work on windows, @gagern , is that obsolete after the fixes you mention?

When I fixed the Windows issues, I did not want to remove that comment, because there were two things that were seen: first is that the Closure compiler project is not able to read in -s MEM_INIT_MODE=2 forms of input files on Windows. The second is that some text editors on Windows cut off opening the text file once they encountered first zero byte (iirc Notepad++ had this issue, but Sublime Text 2 did not), though that is not something we can fix. Apart from those two, MEM_INIT_MODE=2 did work ok on Windows based on the tests in the suite.

Given that PR #5296 does not conflict with MEM_INIT_MODE=2 but preserves it as enabled (correct me if I got that wrong?), is there a strong need to deprecate MEM_INIT_MODE=2, especially if @gagern and IMVU are using it? The maintenance cost does not seem high either, and the good bit is that developers who don't use it don't pay any runtime cost on compiled output size, it's only a couple of added functions in the compiler side.

@stale
Copy link

stale bot commented Sep 18, 2019

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants