Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZSTDDecoder, add ZSTD support to KTX2Loader. #19932

Merged
merged 3 commits into from
Jul 29, 2020

Conversation

donmccurdy
Copy link
Collaborator

@donmccurdy donmccurdy commented Jul 26, 2020

Adds a standalone ZSTD decoder, built from source as explained in the header:

/**
 * ZSTD (Zstandard) decoder.
 *
 * Compiled from https://github.com/facebook/zstd/tree/dev/contrib/single_file_libs, with the
 * following steps:
 *
 * ```
 * ./combine.sh -r ../../lib -o zstddeclib.c zstddeclib-in.c
 * emcc zstddeclib.c -Oz -s EXPORTED_FUNCTIONS="['_ZSTD_decompress', '_ZSTD_findDecompressedSize', '_ZSTD_isError', '_malloc', '_free']" -s ALLOW_MEMORY_GROWTH=1 -s MALLOC=emmalloc -o zstddec.wasm
 * base64 zstddec.wasm > zstddec.txt
 * ```
 *
 * The base64 string written to `zstddec.txt` is embedded as the `wasm` variable at the bottom
 * of this file. The rest of this file is written by hand, in order to avoid an additional JS
 * wrapper generated by Emscripten.
 */

The ZSTD decoder is 28kb gzipped here — @zeux it sounded like your build was smaller (KhronosGroup/glTF#1751 (comment)). Do you think that's a compilation mistake on my part, or something to do with compiling alone, and not as part of a larger WASM library?

/cc @MarkCallow

@donmccurdy
Copy link
Collaborator Author

I tried -Os, but lost the malloc and free exports that this code depends on. The (broken) output was a bit smaller, 22kb gzipped.

@zeux
Copy link
Contributor

zeux commented Jul 26, 2020

@donmccurdy I might have used -Os. If you need malloc/free you should explicitly export them via -s EXPORTED_FUNCTIONS. You should also enable emmalloc (don’t remember the command line OTOH).

@donmccurdy
Copy link
Collaborator Author

Ok, thanks! I'd tried including malloc/free with exported functions but didn't prefix them with _, it worked the second time around. With emmalloc we're down to about 18.5 kb gzipped.

@donmccurdy
Copy link
Collaborator Author

I've also published this to npm, as zstddec: https://github.com/donmccurdy/zstddec. It has the annoying limitation of requiring that you know the uncompressed size of the data in advance: findDecompressedSize(...) fails, sometimes, with the error wasm function signature contains illegal type. I'm not sure why that is yet, but it doesn't affect KTX2Loader because we always know the uncompressed size of the texture.

@mrdoob mrdoob added this to the r119 milestone Jul 27, 2020
@MarkCallow
Copy link

@donmccurdy

  1. have you managed to avoid 2 copies - one from the incoming ArrayBuffer to the zstd decoder and another from the decoder to the transcoder?
  2. Should I make this part of msc_basis_transcoder module or are you happy for it to be separate?
  3. Regardless of the answer to 2, should I publish msc_basis_transcoder to npm? If so I'll need some guidance from you.

@zeux
Copy link
Contributor

zeux commented Jul 27, 2020

FWIW the only way to avoid extra copies with the existing Wasm capabilities is to fuse the zstd decoder and transcoder into a single .wasm module. You will still get 2 copies - one copying the source buffer into Wasm memory, and one copying the resulting texture out of Wasm memory - but if you have two separate Wasm modules, you will need a third copy.

@MarkCallow
Copy link

@donmccurdy, one other thing. You should be cautious of using the uncompressedByteLengths in a KTX file in case of a malicious file. That goes for the size returned by findDecompressedSize too in case someone crafts a file to expand enormously.

@MarkCallow
Copy link

@zeux, since the 3rd copy is the same regardless, I was ignoring it in my comment. The 2 copies I referred to are source buffer to zstd and zstd to transcoder. I believe it may be possible to construct the zstd decoder as a side module (-s SIDE_MODULE) which makes it essentially a dll that could be loaded by the transcoder and will use the same HEAP8. I have not yet had time to try this. Perhaps @donmccurdy can give it a try. If it is not, then the decoder and transcoder should be combined into a single module.

@zeux
Copy link
Contributor

zeux commented Jul 27, 2020

@MarkCallow You can not load two different Wasm modules into a single Wasm instance with the browser API (this may be possible in other Wasm environments, but that's not material here).

@MarkCallow
Copy link

According to https://github.com/emscripten-core/emscripten/wiki/Linking, Emscripten has support for dynamic linking of Modules and Side Modules. It does not mention anything about using Browser APIs or limitations using this with browsers except that Chromium does not support compiling more that 4kb of wasm on the main thread for which the page offers workarounds.

That said, there is probably little practical benefit to using a side module vs making the zstd decoder part of msc_basis_transcoder.

@donmccurdy
Copy link
Collaborator Author

Should I make this part of msc_basis_transcoder module or are you happy for it to be separate?

This might be a question for Rich... would a container-agnostic transcoder in the basis_universal repository have zstd support? It would probably save 5-10kb to combine them, which is nice. I'm inclined to wait on this decision until we've figured out whether the KHR_texture_basisu will include zstd or rely on glTF file-level supercompression instead.

Regardless of the answer to 2, should I publish msc_basis_transcoder to npm? If so I'll need some guidance from you.

I think we probably should, but let's discuss this on Wednesday. My main question is whether we'd be publishing it from the KTX-Software repository or the basis_universal repository, or somewhere else.

@donmccurdy
Copy link
Collaborator Author

donmccurdy commented Jul 27, 2020

You should be cautious of using the uncompressedByteLengths in a KTX file in case of a malicious file.

The browser should handle this gracefully enough: worst case, we crash the tab, but it's not possible to detect the browser tab's memory limit anyway. I'm less sure about Node.js. I understand WebAssembly has a memory limit of 4GB in v8, perhaps we should enforce a limit around 2GB, in case Node.js is less secure than the browser? Or perhaps we allow the user to pass a max acceptable uncompressed size?

I'd rather not deal with WASM side modules at this point... The practical benefit seems small compared to other remaining work for KHR_texture_basisu.

@zeux
Copy link
Contributor

zeux commented Jul 28, 2020

re: Wasm side modules - ah, I stand corrected. I think this is possible by instantiating both modules to share the WebAssembly.Memory object, which would maintain the pointer identity across them. Presumably this is what Emscripten's JS wrappers do as well - I don't think it's worth the trouble in this case but it's good to know this is possible in theory.

@mrdoob mrdoob merged commit bde85c7 into mrdoob:dev Jul 29, 2020
@mrdoob
Copy link
Owner

mrdoob commented Jul 29, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants