exr piz wavelet decompression #13346

richardmonette · 2018-02-16T17:17:51Z

Adds support for reading PIZ wavelet compressed EXR images (in addition to the previously supported uncompressed reader.) It's been quite the task, but I have worked back from the C/C++ reference OpenEXR and TinyEXR implementations, making the appropriate changes and translations into Javascript. (Ever converted pointer arithmetic to a language that only has floating point data type?! 😨) I'd like to keep following this up with better handling for various channel configs, 32bit floating point, etc. and more edge cases, but I think this is useful enough to check it in, and hopefully others can also help out, since the major hurdle is passed.

@mrdoob @WestLangley

related #10652

bhouston · 2018-02-17T16:52:03Z

This is awesome!

I am concerned it is slow code though, this function, wdec14, is in an inner loop and does two new calls: https://github.com/mrdoob/three.js/pull/13346/files#diff-8c3168ce268d9c60ad2c1e022e579e3aR307

Remember that you can create views into the same data set. It may be possible to use a preallocated view and just access it repeatedly. you can also create multiple views of the same data set. I believe that such a strategy will speed this code up like 10x.

Same with parseUint16 - use a preallocated view and then calculate offsets into it. It should be straight forward to implement this and a huge time saving.

Also remember there is already a half conversion function here that was used for HDR half conversion: https://github.com/mrdoob/three.js/blob/master/examples/js/loaders/HDRCubeTextureLoader.js#L27 But I guess this is the wrong way. There was a half class a while back, but I guess it got refactored out along the way, which is too bad.

bhouston · 2018-02-17T16:56:56Z

examples/js/loaders/EXRLoader.js


-			if ( EXRHeader.channels[ channelID ].pixelType == 1 ) {
+						var val = parseFloat16( buffer, offset );
+						var cOff = channelOffsets[ EXRHeader.channels[ channelID ].name ];


move cOff outside of inner loop.

bhouston · 2018-02-17T16:57:13Z

examples/js/loaders/EXRLoader.js

+						// HALF
+						for ( var x = 0; x < width; x ++ ) {
+
+							var cOff = channelOffsets[ EXRHeader.channels[ channelID ].name ];


move cOff outisde of inner loop.

richardmonette · 2018-02-19T17:11:10Z

Hey @bhouston, thank you for your comments!

I have taken your advice and refactored in a few places to either use a function for conversion, or preallocated a DataView in an effort to improve the performance. Using your suggestions, I have reduced the parsing time from ~2149ms to ~327ms. Even more can be done, but I think this puts us in acceptable performance bracket to start with.

Regarding the shared half conversion, I definitely agree we don't want to repeat this type of code all over the place, and that we should put this into some kind of class or helper tool chain, but perhaps that can come in a follow up refactor, since it would require also touching the code in other examples?

richardmonette · 2018-02-19T17:13:09Z

One more thought: as it stands, I am passing around ArrayBuffer/DataView along with a separate offset. I think this could be cleaned up even more by introducing a small object to carry the buffer and offset together, which hopefully I can do in a clean up refactor follow up.

bhouston · 2018-02-19T17:31:16Z

Nice performance improvement. I would go a little further and instead replace this pattern (Which applies to others such as parseFloat16, etc):

function parseFloat32( buffer, offset ) {

		var float = new DataView( buffer.slice( offset.value, offset.value + 4 ) ).getFloat32( 0, true );

		offset.value += 4;

		return float;

	}

Replace the above with the following preallocated DataView strategy. It should remove nearly all allocations while being fairly simple code.


// this is assuming the largest data size is 4 bytes.
var dataViews = [ new DataView( buffer.slice(0) ), new DataView( buffer.slice(1) ), new DataView( buffer.slice(2) ), new DataView( buffer.slice(3) ) ];

function getFloat32( dataViews, offset ) {
  return dataViews[ offset % FLOAT_SIZE ].getFloat32( Math.floor( offset / FLOAT_SIZE ) );
}
function getInt16( dataViews, offset ) {
  return dataViews[ offset % INT16_SIZE ].getFloat32( Math.floor( offset / INT16_SIZE ) );
}
function getInt8( dataViews, offset ) {
  return dataViews[ offset % INT8_SIZE ].getFloat32( Math.floor( offset / INT8_SIZE ) );
}

[...]

var offset = 0;

var myInt8 = getInt8( dataViews, offset );
offset += INT8_SIZE;

var myInt16 = getInt16( dataViews, offset );
offset += INT16_SIZE;

var myInt8 = getInt8( dataViews, offset );
offset += INT8_SIZE;

var myFloat = getFloat32( dataViews, offset );
offset += FLOAT32_SIZE;

This will get rid of nearly all the rest of the allocations....

bhouston · 2018-02-19T17:37:29Z

One more thought: as it stands, I am passing around ArrayBuffer/DataView along with a separate offset. I think this could be cleaned up even more by introducing a small object to carry the buffer and offset together, which hopefully I can do in a clean up refactor follow up.

Sure, but I wouldn't worry about it. Creating objects is the slowest thing in JavaScript by far. In fact often destructuring temporary objects into just individual primitive types is an optimization in JavaScript.

Thus I would not try to clean up the code by introducing temporary JavaScript objects, even if they are created like { buffer: buffer, offset: offset }, they will often slow down the code because it touches the GC.

You basically never want to touch the GC when writing performance oriented JavaScript code -- which is what all my feedback is in regards to.

richardmonette · 2018-02-19T19:50:37Z

Went back over the code again, and I've got the time down yet again to ~240ms, from ~327ms. I've got a single preallocated DataView in nearly every case, which did help a little bit. Using the profiler

Note the profiler seems to show things a bit slower than the actual run time.

I saw the 🔥hot path is actually around parseUint8. To get even faster, I've added a case where we use Uint8Array, instead of DataView, since this is a bit faster even still. This is how I got most of the speed improvement this refactor 🚒.

I also explored some variations where instead of doing return { c: c, lc: lc }; in getChar, etc. I made a preallocated return object and re-used that, however that didn't yield any particular improvement in speed. My impression is that performance-wise the returns are diminishing at this point.

bhouston · 2018-02-19T20:37:09Z

I know I am being annoying but two more changes will basically fix the last remaining memory issues:

Replace this pattern with this:

var temp = getCode(pl.lit, rlc, c, lc, uInt8Array, inDataView, inOffset, outBuffer, outOffset, outBufferEndOffset);
c = temp.c;
lc = temp.lc;

// somewhere out of the inner loop, or make it a semi-global via a closure.
var tempTuple = { c: 0, l: 0 };

...
// later in the inner loop just reuse the tuple constantly.
getCode( tempTuple, .... );
c = tempTuple.c;
cl = tempTuple.cl;

The above pattern of passing in a single allocated object to get the results of the call, instead of returning a new JAvaScript object on each innovation should be a big speed up. This pattern can be applied to wdec15, getChar and getCode -- all of which are in your inner loops.

bhouston · 2018-02-19T20:42:27Z

The last remaining issues for memory are these - these are just killer costly and unnecessary:

three.js/examples/js/loaders/EXRLoader.js

Line 308 in 090ce2f

return { c: c, lc: lc };

three.js/examples/js/loaders/EXRLoader.js

Line 278 in 090ce2f

return { c: c, lc: lc };

three.js/examples/js/loaders/EXRLoader.js

Line 145 in 090ce2f

return { l: (c >> lc) & ((1 << nBits) - 1), c: c, lc: lc };

three.js/examples/js/loaders/EXRLoader.js

Line 335 in 090ce2f

return {a: as, b: bs}

richardmonette · 2018-02-19T20:42:59Z

I see 'em, will fix 👍

richardmonette · 2018-02-19T21:10:36Z

Updated!

mrdoob · 2018-02-20T23:43:39Z

@bhouston looks good?

richardmonette · 2018-02-27T14:50:27Z

Hoping to follow this one up using this to implement IBL with latlong/equiangular EXR HDR light probes!

mrdoob · 2018-03-01T00:35:43Z

Will merge. If @bhouston finds something we can tweak afterwards.

mrdoob · 2018-03-01T00:35:57Z

Thanks!

exr piz wavelet decompression

801a387

mrdoob added this to the r91 milestone Feb 16, 2018

richardmonette mentioned this pull request Feb 17, 2018

add EXRLoader example #12891

Merged

bhouston reviewed Feb 17, 2018

View reviewed changes

cache new DataView to increase performance

552d0ed

richardmonette added 2 commits February 19, 2018 14:15

most parse function rewritten to have no new calls

56ad708

add special case for uInt8Array for extra speed

090ce2f

remove return value allocations

38dc276

mrdoob merged commit b12d123 into mrdoob:dev Mar 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exr piz wavelet decompression #13346

exr piz wavelet decompression #13346

richardmonette commented Feb 16, 2018

bhouston commented Feb 17, 2018

bhouston Feb 17, 2018

bhouston Feb 17, 2018

richardmonette commented Feb 19, 2018

richardmonette commented Feb 19, 2018

bhouston commented Feb 19, 2018

bhouston commented Feb 19, 2018

richardmonette commented Feb 19, 2018 •

edited

Loading

bhouston commented Feb 19, 2018

bhouston commented Feb 19, 2018

richardmonette commented Feb 19, 2018

richardmonette commented Feb 19, 2018

mrdoob commented Feb 20, 2018

richardmonette commented Feb 27, 2018

mrdoob commented Mar 1, 2018

mrdoob commented Mar 1, 2018

exr piz wavelet decompression #13346

exr piz wavelet decompression #13346

Conversation

richardmonette commented Feb 16, 2018

bhouston commented Feb 17, 2018

bhouston Feb 17, 2018

Choose a reason for hiding this comment

bhouston Feb 17, 2018

Choose a reason for hiding this comment

richardmonette commented Feb 19, 2018

richardmonette commented Feb 19, 2018

bhouston commented Feb 19, 2018

bhouston commented Feb 19, 2018

richardmonette commented Feb 19, 2018 • edited Loading

bhouston commented Feb 19, 2018

bhouston commented Feb 19, 2018

richardmonette commented Feb 19, 2018

richardmonette commented Feb 19, 2018

mrdoob commented Feb 20, 2018

richardmonette commented Feb 27, 2018

mrdoob commented Mar 1, 2018

mrdoob commented Mar 1, 2018

richardmonette commented Feb 19, 2018 •

edited

Loading