-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LJpeg predictors 2-7 #334
base: develop
Are you sure you want to change the base?
Conversation
I'm currently crossing all t's and dotting all i's so that #325 can happen. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #334 +/- ##
===========================================
- Coverage 60.95% 60.76% -0.20%
===========================================
Files 273 273
Lines 16408 16466 +58
Branches 2077 2078 +1
===========================================
+ Hits 10002 10006 +4
- Misses 6278 6332 +54
Partials 128 128
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
int predB = predMode > 1 ? img(row - 1, col + N_COMP + i) : 0; | ||
int predC = predMode > 1 ? img(row - 1, col + i) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, these are out-of-bounds reads for first row and/or last column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, hence all the checks around...
pred[c] = uint16_t(predC); | ||
break; | ||
case 4: | ||
pred[c] = uint16_t(predA + predB - predC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Predictor 4 is special because A + B - C can overflow uint16_t. The prediction and diff calculation must be done in int32_t. Prediction value can be negative, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's why predA, predB, predC are declared as ints above. Are you saying I should check and clamp to 0 here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reconstruction using diff and prediction value must be done with signed integers, then you can safely cast the result back to uint16_t. If you cast the prediction value first (loosing some bits) and then do simple unsigned integer math, I assume you get wrong results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of itu-t81 is that only the intermediate calculation is to be done as signed int. The predictor is stored and used "modulo 2^16", i.e. unsigned:
The difference between the prediction value and the input is calculated modulo 2^16. In the decoder the difference is
decoded and added, modulo 2^16, to the prediction.
But I do need to check how the two are added back together earlier indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the uint16_t casting takes care of the modulo operations described above by design, and no changes are in fact necessary, but happy to be proven wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's difficult to prove. I've added a test case to my implementation where A = B = 65535
and C=0
and the compression/decompression verification fails if I calculate this with u16 instead of i32. I can not tell you if some integer magic solves this in your code but it seems legit to assume there is an overflow which leads to wrong results in edge cases.
For me, the edge case was linear raw data (RGB instead of CFA) that uses the full 2^16 value range, writing into DNG with ljpeg compression. For regular CFA images, the input is ~ 2^14, so you never hit overflows.
It's difficult to test as we don't have a ljpeg compressor in rawspeed to write such small tests.
For prediction > 4 and CFA images, it is common practice to pack two lines into one, so instead of:
the jpeg image becomes:
Because this pattern works better for prediction modes > 4 and component count = 2. This is currently not handled by the code, as it is assumed that the dimension of the ljpeg stream is the same as the image/tile.
I will generate you a set of DNG files with pred 1-7, comp=2 and packed lines later this day. |
My understanding is that this reshaping happens at a different level of abstraction, and that the number of SOF3 channels at this stage is already correct, and could indeed be different from the CFA channels (i.e. 1). This is how it already works in rawspeed for "normal" DNG lossless compression (2 channels in SOF3, pred 1) See also my findings in #189 The current code works out if you're reshaping by widening (Adobe), but doesn't work if you're reshaping by squeezing (which is what Blackmagic do). |
In any case, I will not take this further (apart from fixing any overflows/underflows you can demonstrate), as it works on Apple's DNGs which I wanted to give people a potential patch to use on until Roman implements this in Halide. |
There are multiple options. First, use comp=1, then w/h of the ljpeg is identical to tile w/h. Adobe uses prediction 1 and comp=2 and because of that, the width is half but height is unchanged. BlackMagic uses comp=2, too, but because of the prediction mode, SOF3 width becomes (width/2)*2 and SOF3 height (height/2) |
Blackmagic use comp=1 and 2H x W/2 |
(FYI if it isn't obvious i'm in
I'm basically having an existential crisis of conciseness here, and that has been happening for some time now. |
The sentiment is shared, I don't think it's ideal either. But as you already know there are a limited number of motivated/interested contributors who can properly deal w/ the C++ templated abstraction already laid out here and confident in what LLVM will compile that code to, imagine how many there will be proficient in Halide... In the meantime, the requests to support new codecs keep piling on :/ |
Oh... I just assumed this because otherwise the rawspeed error in #189 wouldn't make sense. Let me check this deeper when I find some time, from the errors and width/height values reported it should be theoretically the same issue as with my line-packed dng files. |
The error in #189 is there because it precisely assumes 2 components so W/2 * 2 = W (or in general it assumes you'll always have W/N *N comp = W). Blackmagic don't do that (I guess because of the predictor), so that assumption brakes down. Mind you, we also have comp = 4 in the Sony case, but I haven't been motivated to do the plumbing for that one, which should work out hopefully better than the Blackmagic case. |
Let me put it it this way. While i have not made a decisions yet, |
Here is a set of DNG files with various predictors: https://chaospixel.com/pub/temp/dng_pred_tests.tar.bz2 |
@kmilos Just for fun, I've looked into https://github.com/yanburman/dng_sdk/blob/master/source/dng_lossless_jpeg.cpp#L2501 and https://github.com/yanburman/dng_sdk/blob/master/source/dng_lossless_jpeg.cpp#L1428 which comes from the Adobe DNG SDK lossless decoder. |
Yet more spaghetti, but I believe this is now functionally correct for the intermediate predictor calculations. |
5a487ba
to
7a016ac
Compare
Btw, the Sony case won't work neither w/o some extra reshaping because we're not in the W/4 x 4 comp arrangement assumed, but H/2 x W/2 x 4. |
7a016ac
to
bb1526a
Compare
FWIW, i've been thinking about this, and i suppose i know a way to implement I haven't checked, but i'm not sure we have all the necessary samples So essentially we need at least 8*4 samples. (i may be forgetting some other permutation). |
Right, restart interval. |
Blackmagic DNGs have predictor 6, but have another problem (component arrangement as mentioned above and in #189) preventing its testing... So it might be an idea to generalize the component arrangement first (1x1, 1x2, 2x1, 2x2), thus ticking Sony lossless off the list? Or perhaps you prefer to handle both component arrangement and predictors in one go? I hope @cytrinox could help with the 4*8 test vectors then... 🙏 |
Sony FUBAR'ed their "LJpeg" implementation to the point of it no longer being a proper LJpeg. |
Why do you think so? AFAICT it is, just that the component scatter/gather is different from the Adobe one... I.e. for a 2x2 Bayer CFA Adobe does 2 components like this 1 2 1 2 ... While Sony does 4 components like this: 1 2 1 2 ... Each component in both Adobe and Sony cases seems to be regular LJpeg SOF3 (w/ already supported predictor 1)... The arrangement of components has nothing to do w/ LJpeg itself, and current LJpeg code seems to work fine once the arrangement is taken into account.. What they FUBAR'ed is on the TIFF container level by having both Tile* and Strip* tags simultaneously, and making ImageWidth/Height an integer of tile size, when it doesn't have to be (which is why I advocated for the absolute crop for these models covering all modes instead of a relative, negative one; but this then makes APS-C crop more complicated so it's a lose-lose either way)... |
And BlackMagic do something even weirder, from a 2x2 Bayer CFA: 1 1 ... to a single components LJpeg like this: 1 1 1 1 (w/ predictor 6) |
That's my point exactly actually. As long as that |
Sure, and I already mentioned there that should be generalized - one just needs to parse the SOF3 header for the no. of components and their frame size, and compare to the TIFF CFA tile size, and that gives the arrangement: E.g. Sony case: TIFF tile is 512x512, SOF3 is 4 components of 256x256 Adobe case: TIFF tile is 256x256, SOF3 is 2 components of 256x128 BlackMagic case: TIFF tile is 512x512, SOF3 is 1 component of 256x1024 The arrangement is calculated from the TIFF tile HxW divided by SOF3 YxX, and to check conformance, one must have N_comp*Y*X = H*W (I think already checked). Both the Sony and BlackMagic SOF3 headers are correct in what they implement AFAICT, so nothing is really at odds here, "just" missing implementation in rawspeed... |
I see.
and
and for cpp=4 we have three:
.. that multiplies the needed sample set by another x2 or so. |
😮 |
Not quite. For cpp=4 you can have two: 1 2 or 1 3 i.e. row or column major scatter of the 2x2 Bayer into 4 separate component planes. Like this for row major: (In theory, one can of course have all the other ways of ordering 4 components, 4! in total, but that would be a bit silly...) For cpp=1 you could have 2 (well, 3 if also you count identity, i.e. no rearrangement, just like you illustrated above). |
Sorry, not following the logic there. I can see how the arrangements i listed can be specified |
Np, I'll try to do a clearer drawing for the three (Adobe, Sony, and BlackMagic) cases a bit later and add here. |
@LebedevRI I hope the following illustration of the scatter/gather schemes makes it clearer: These are all perfectly valid w.r.t. ITU-T T.81. The components highlighted in yellow make up one MCU, and the dotted ones are used as predictors (usually 1, i.e. just horizontal, but 6 for Blackmagic/CinemaDNG case). (There is also the trivial case of DNG spp=3 LinearRaw with 1:1 mapping to SOF3 Nf=3, which is already supported for predictor 1, but not predictor 7 using the row above like Apple do.) |
One thing i can say right away -- if nothing, it's a pretty nice infographics! |
I have a somewhat dirty PR in #386 that supports the Sony arrangement and might support blackmagic arrangement if I understand it correctly but I can't test. |
ce27abc
to
f06a070
Compare
f06a070
to
fb143a7
Compare
I think this was the last time I rebased this - the new code structure that imposes use of a single row only (effectively supporting only predictor 1) makes working around this even uglier than before, if that's possible 😮 |
This is true for all RPU samples at least, even weird 3-component ones. This seems like the missing info trivia which allows support for different LJpeg CPS layouts (e.g. the square one)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #334 +/- ##
===========================================
- Coverage 60.95% 60.76% -0.20%
===========================================
Files 273 273
Lines 16408 16466 +58
Branches 2077 2078 +1
===========================================
+ Hits 10002 10006 +4
- Misses 6278 6332 +54
Partials 128 128
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Proof of concept (non-performant) to address #189 and #258