Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exr piz wavelet decompression #13346

Merged
merged 5 commits into from
Mar 1, 2018
Merged

Conversation

richardmonette
Copy link
Contributor

Adds support for reading PIZ wavelet compressed EXR images (in addition to the previously supported uncompressed reader.) It's been quite the task, but I have worked back from the C/C++ reference OpenEXR and TinyEXR implementations, making the appropriate changes and translations into Javascript. (Ever converted pointer arithmetic to a language that only has floating point data type?! 😨) I'd like to keep following this up with better handling for various channel configs, 32bit floating point, etc. and more edge cases, but I think this is useful enough to check it in, and hopefully others can also help out, since the major hurdle is passed.

@mrdoob @WestLangley

related #10652

@mrdoob mrdoob added this to the r91 milestone Feb 16, 2018
@bhouston
Copy link
Contributor

This is awesome!

I am concerned it is slow code though, this function, wdec14, is in an inner loop and does two new calls: https://github.com/mrdoob/three.js/pull/13346/files#diff-8c3168ce268d9c60ad2c1e022e579e3aR307

Remember that you can create views into the same data set. It may be possible to use a preallocated view and just access it repeatedly. you can also create multiple views of the same data set. I believe that such a strategy will speed this code up like 10x.

Same with parseUint16 - use a preallocated view and then calculate offsets into it. It should be straight forward to implement this and a huge time saving.

Also remember there is already a half conversion function here that was used for HDR half conversion: https://github.com/mrdoob/three.js/blob/master/examples/js/loaders/HDRCubeTextureLoader.js#L27 But I guess this is the wrong way. There was a half class a while back, but I guess it got refactored out along the way, which is too bad.


if ( EXRHeader.channels[ channelID ].pixelType == 1 ) {
var val = parseFloat16( buffer, offset );
var cOff = channelOffsets[ EXRHeader.channels[ channelID ].name ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move cOff outside of inner loop.

// HALF
for ( var x = 0; x < width; x ++ ) {

var cOff = channelOffsets[ EXRHeader.channels[ channelID ].name ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move cOff outisde of inner loop.

@richardmonette
Copy link
Contributor Author

Hey @bhouston, thank you for your comments!

I have taken your advice and refactored in a few places to either use a function for conversion, or preallocated a DataView in an effort to improve the performance. Using your suggestions, I have reduced the parsing time from ~2149ms to ~327ms. Even more can be done, but I think this puts us in acceptable performance bracket to start with.

Regarding the shared half conversion, I definitely agree we don't want to repeat this type of code all over the place, and that we should put this into some kind of class or helper tool chain, but perhaps that can come in a follow up refactor, since it would require also touching the code in other examples?

@richardmonette
Copy link
Contributor Author

One more thought: as it stands, I am passing around ArrayBuffer/DataView along with a separate offset. I think this could be cleaned up even more by introducing a small object to carry the buffer and offset together, which hopefully I can do in a clean up refactor follow up.

@bhouston
Copy link
Contributor

Nice performance improvement. I would go a little further and instead replace this pattern (Which applies to others such as parseFloat16, etc):

function parseFloat32( buffer, offset ) {

		var float = new DataView( buffer.slice( offset.value, offset.value + 4 ) ).getFloat32( 0, true );

		offset.value += 4;

		return float;

	}

Replace the above with the following preallocated DataView strategy. It should remove nearly all allocations while being fairly simple code.


// this is assuming the largest data size is 4 bytes.
var dataViews = [ new DataView( buffer.slice(0) ), new DataView( buffer.slice(1) ), new DataView( buffer.slice(2) ), new DataView( buffer.slice(3) ) ];

function getFloat32( dataViews, offset ) {
  return dataViews[ offset % FLOAT_SIZE ].getFloat32( Math.floor( offset / FLOAT_SIZE ) );
}
function getInt16( dataViews, offset ) {
  return dataViews[ offset % INT16_SIZE ].getFloat32( Math.floor( offset / INT16_SIZE ) );
}
function getInt8( dataViews, offset ) {
  return dataViews[ offset % INT8_SIZE ].getFloat32( Math.floor( offset / INT8_SIZE ) );
}

[...]

var offset = 0;

var myInt8 = getInt8( dataViews, offset );
offset += INT8_SIZE;

var myInt16 = getInt16( dataViews, offset );
offset += INT16_SIZE;

var myInt8 = getInt8( dataViews, offset );
offset += INT8_SIZE;

var myFloat = getFloat32( dataViews, offset );
offset += FLOAT32_SIZE;

This will get rid of nearly all the rest of the allocations....

@bhouston
Copy link
Contributor

One more thought: as it stands, I am passing around ArrayBuffer/DataView along with a separate offset. I think this could be cleaned up even more by introducing a small object to carry the buffer and offset together, which hopefully I can do in a clean up refactor follow up.

Sure, but I wouldn't worry about it. Creating objects is the slowest thing in JavaScript by far. In fact often destructuring temporary objects into just individual primitive types is an optimization in JavaScript.

Thus I would not try to clean up the code by introducing temporary JavaScript objects, even if they are created like { buffer: buffer, offset: offset }, they will often slow down the code because it touches the GC.

You basically never want to touch the GC when writing performance oriented JavaScript code -- which is what all my feedback is in regards to.

@richardmonette
Copy link
Contributor Author

richardmonette commented Feb 19, 2018

Went back over the code again, and I've got the time down yet again to ~240ms, from ~327ms. I've got a single preallocated DataView in nearly every case, which did help a little bit. Using the profiler

screen shot 2018-02-19 at 2 44 55 pm

Note the profiler seems to show things a bit slower than the actual run time.

I saw the 🔥hot path is actually around parseUint8. To get even faster, I've added a case where we use Uint8Array, instead of DataView, since this is a bit faster even still. This is how I got most of the speed improvement this refactor 🚒.

I also explored some variations where instead of doing return { c: c, lc: lc }; in getChar, etc. I made a preallocated return object and re-used that, however that didn't yield any particular improvement in speed. My impression is that performance-wise the returns are diminishing at this point.

@bhouston
Copy link
Contributor

I know I am being annoying but two more changes will basically fix the last remaining memory issues:

Replace this pattern with this:

var temp = getCode(pl.lit, rlc, c, lc, uInt8Array, inDataView, inOffset, outBuffer, outOffset, outBufferEndOffset);
c = temp.c;
lc = temp.lc;
// somewhere out of the inner loop, or make it a semi-global via a closure.
var tempTuple = { c: 0, l: 0 };

...
// later in the inner loop just reuse the tuple constantly.
getCode( tempTuple, .... );
c = tempTuple.c;
cl = tempTuple.cl;

The above pattern of passing in a single allocated object to get the results of the call, instead of returning a new JAvaScript object on each innovation should be a big speed up. This pattern can be applied to wdec15, getChar and getCode -- all of which are in your inner loops.

@bhouston
Copy link
Contributor

The last remaining issues for memory are these - these are just killer costly and unnecessary:

return { c: c, lc: lc };

return { c: c, lc: lc };

return { l: (c >> lc) & ((1 << nBits) - 1), c: c, lc: lc };

return {a: as, b: bs}

@richardmonette
Copy link
Contributor Author

I see 'em, will fix 👍

@richardmonette
Copy link
Contributor Author

Updated!

@mrdoob
Copy link
Owner

mrdoob commented Feb 20, 2018

@bhouston looks good?

@richardmonette
Copy link
Contributor Author

Hoping to follow this one up using this to implement IBL with latlong/equiangular EXR HDR light probes!

@mrdoob
Copy link
Owner

mrdoob commented Mar 1, 2018

Will merge. If @bhouston finds something we can tweak afterwards.

@mrdoob mrdoob merged commit b12d123 into mrdoob:dev Mar 1, 2018
@mrdoob
Copy link
Owner

mrdoob commented Mar 1, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants