Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LJpeg predictors 2-7 #334

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

kmilos
Copy link
Collaborator

@kmilos kmilos commented Dec 30, 2021

Proof of concept (non-performant) to address #189 and #258

@kmilos kmilos requested a review from LebedevRI as a code owner December 30, 2021 10:48
@LebedevRI
Copy link
Member

I'm currently crossing all t's and dotting all i's so that #325 can happen.

@codecov
Copy link

codecov bot commented Dec 30, 2021

Codecov Report

Attention: Patch coverage is 12.85714% with 61 lines in your changes are missing coverage. Please review.

Project coverage is 60.76%. Comparing base (876c91f) to head (fb143a7).
Report is 157 commits behind head on develop.

Files Patch % Lines
...rc/librawspeed/decompressors/LJpegDecompressor.cpp 10.60% 59 Missing ⚠️
...zz/librawspeed/decompressors/LJpegDecompressor.cpp 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #334      +/-   ##
===========================================
- Coverage    60.95%   60.76%   -0.20%     
===========================================
  Files          273      273              
  Lines        16408    16466      +58     
  Branches      2077     2078       +1     
===========================================
+ Hits         10002    10006       +4     
- Misses        6278     6332      +54     
  Partials       128      128              
Flag Coverage Δ
benchmarks 11.84% <0.00%> (-0.05%) ⬇️
integration 44.81% <15.00%> (-0.14%) ⬇️
linux 56.94% <13.04%> (-0.20%) ⬇️
macOS 25.20% <0.00%> (-0.07%) ⬇️
rpu_u 44.81% <15.00%> (-0.14%) ⬇️
unittests 21.55% <0.00%> (-0.08%) ⬇️
windows ∅ <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 210 to 211
int predB = predMode > 1 ? img(row - 1, col + N_COMP + i) : 0;
int predC = predMode > 1 ? img(row - 1, col + i) : 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, these are out-of-bounds reads for first row and/or last column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, hence all the checks around...

pred[c] = uint16_t(predC);
break;
case 4:
pred[c] = uint16_t(predA + predB - predC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predictor 4 is special because A + B - C can overflow uint16_t. The prediction and diff calculation must be done in int32_t. Prediction value can be negative, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's why predA, predB, predC are declared as ints above. Are you saying I should check and clamp to 0 here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconstruction using diff and prediction value must be done with signed integers, then you can safely cast the result back to uint16_t. If you cast the prediction value first (loosing some bits) and then do simple unsigned integer math, I assume you get wrong results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of itu-t81 is that only the intermediate calculation is to be done as signed int. The predictor is stored and used "modulo 2^16", i.e. unsigned:

The difference between the prediction value and the input is calculated modulo 2^16. In the decoder the difference is
decoded and added, modulo 2^16, to the prediction.

But I do need to check how the two are added back together earlier indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the uint16_t casting takes care of the modulo operations described above by design, and no changes are in fact necessary, but happy to be proven wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's difficult to prove. I've added a test case to my implementation where A = B = 65535 and C=0 and the compression/decompression verification fails if I calculate this with u16 instead of i32. I can not tell you if some integer magic solves this in your code but it seems legit to assume there is an overflow which leads to wrong results in edge cases.

For me, the edge case was linear raw data (RGB instead of CFA) that uses the full 2^16 value range, writing into DNG with ljpeg compression. For regular CFA images, the input is ~ 2^14, so you never hit overflows.

It's difficult to test as we don't have a ljpeg compressor in rawspeed to write such small tests.

src/librawspeed/decompressors/LJpegDecompressor.cpp Outdated Show resolved Hide resolved
@cytrinox
Copy link
Contributor

cytrinox commented Jan 7, 2022

For prediction > 4 and CFA images, it is common practice to pack two lines into one, so instead of:

RGRGRGRG
GBGBGBGB
RGRGRGRG
GBGBGBGB

the jpeg image becomes:

RGRGRGRGGBGBGBGB
RGRGRGRGGBGBGBGB

Because this pattern works better for prediction modes > 4 and component count = 2.

This is currently not handled by the code, as it is assumed that the dimension of the ljpeg stream is the same as the image/tile.

[rawspeed] (dummyreg5.dng) void rawspeed::AbstractDngDecompressor::decompress() const, line 217: Too many errors encountered. Giving up. First Error:
virtual void rawspeed::LJpegDecompressor::decodeScan(), line 112: LJpeg frame (1184, 79) is smaller than expected (592, 158)

I will generate you a set of DNG files with pred 1-7, comp=2 and packed lines later this day.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

My understanding is that this reshaping happens at a different level of abstraction, and that the number of SOF3 channels at this stage is already correct, and could indeed be different from the CFA channels (i.e. 1). This is how it already works in rawspeed for "normal" DNG lossless compression (2 channels in SOF3, pred 1)

See also my findings in #189

The current code works out if you're reshaping by widening (Adobe), but doesn't work if you're reshaping by squeezing (which is what Blackmagic do).

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

In any case, I will not take this further (apart from fixing any overflows/underflows you can demonstrate), as it works on Apple's DNGs which I wanted to give people a potential patch to use on until Roman implements this in Halide.

@cytrinox
Copy link
Contributor

cytrinox commented Jan 7, 2022

The current code works out if you're reshaping by widening (Adobe), but doesn't work if you're reshaping by squeezing (which is what Blackmagic do).

There are multiple options. First, use comp=1, then w/h of the ljpeg is identical to tile w/h.
For CFA images, it's wise to use comp=2, then lpjeg width is divided by two (squeezing) without reducing the height. And for prediction 4-7, you pack two lines into one, leads to streching (width*2) and (height/2). You can use line packing with just one component, event four is save if the implementation respects it.

Adobe uses prediction 1 and comp=2 and because of that, the width is half but height is unchanged. BlackMagic uses comp=2, too, but because of the prediction mode, SOF3 width becomes (width/2)*2 and SOF3 height (height/2)

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

BlackMagic uses comp=2

Blackmagic use comp=1 and 2H x W/2

@LebedevRI
Copy link
Member

(FYI if it isn't obvious i'm in #pixls.us @ OFTC)

<...> until Roman implements this in Halide.

I'm basically having an existential crisis of conciseness here, and that has been happening for some time now.
I really don't like how all of our (not just darktable-org's, but more in general in the whole greater open graphics ecosystem)
image processing loops are just hacked together with shitty unreadable unmodifyable unperformant C-like spagetti.
Moving to higher-level abstraction should be a huge win, but i'm afraid of being false messiah, and just making things worse.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

The sentiment is shared, I don't think it's ideal either. But as you already know there are a limited number of motivated/interested contributors who can properly deal w/ the C++ templated abstraction already laid out here and confident in what LLVM will compile that code to, imagine how many there will be proficient in Halide... In the meantime, the requests to support new codecs keep piling on :/

@cytrinox
Copy link
Contributor

cytrinox commented Jan 7, 2022

BlackMagic uses comp=2

Blackmagic use comp=1 and 2H x W/2

Oh... I just assumed this because otherwise the rawspeed error in #189 wouldn't make sense. Let me check this deeper when I find some time, from the errors and width/height values reported it should be theoretically the same issue as with my line-packed dng files.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

The error in #189 is there because it precisely assumes 2 components so W/2 * 2 = W (or in general it assumes you'll always have W/N *N comp = W). Blackmagic don't do that (I guess because of the predictor), so that assumption brakes down.

Mind you, we also have comp = 4 in the Sony case, but I haven't been motivated to do the plumbing for that one, which should work out hopefully better than the Blackmagic case.

@LebedevRI
Copy link
Member

(FYI if it isn't obvious i'm in #pixls.us @ OFTC)

<...> until Roman implements this in Halide.

I'm basically having an existential crisis of conciseness here, and that has been happening for some time now. I really don't like how all of our (not just darktable-org's, but more in general in the whole greater open graphics ecosystem) image processing loops are just hacked together with shitty unreadable unmodifyable unperformant C-like spagetti. Moving to higher-level abstraction should be a huge win, but i'm afraid of being false messiah, and just making things worse.

The sentiment is shared, I don't think it's ideal either. But as you already know there are a limited number of motivated/interested contributors who can properly deal w/ the C++ templated abstraction already laid out here and confident in what LLVM will compile that code to, imagine how many there will be proficient in Halide... In the meantime, the requests to support new codecs keep piling on :/

Let me put it it this way. While i have not made a decisions yet,
i am starting to believe that this is a "survival of the fittest" question.
As in, either our ecosystem does make that evolutionary jump,
or it does deserve to go extinct.

@cytrinox
Copy link
Contributor

cytrinox commented Jan 7, 2022

Here is a set of DNG files with various predictors: https://chaospixel.com/pub/temp/dng_pred_tests.tar.bz2

@cytrinox
Copy link
Contributor

cytrinox commented Jan 7, 2022

@kmilos Just for fun, I've looked into https://github.com/yanburman/dng_sdk/blob/master/source/dng_lossless_jpeg.cpp#L2501 and https://github.com/yanburman/dng_sdk/blob/master/source/dng_lossless_jpeg.cpp#L1428 which comes from the Adobe DNG SDK lossless decoder.
There is a special case when Px = Ra + ((Rb – Rc)/2) leads to Ra + (-1 / 2) which is different to Ra + (-1 >> 1) and I wanted to know how Adobe solved this. They just use signed integers and doing a right shift, which leads to -1 >> 1 = -1.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

Yet more spaghetti, but I believe this is now functionally correct for the intermediate predictor calculations.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 7, 2022

Btw, the Sony case won't work neither w/o some extra reshaping because we're not in the W/4 x 4 comp arrangement assumed, but H/2 x W/2 x 4.

@LebedevRI
Copy link
Member

FWIW, i've been thinking about this, and i suppose i know a way to implement
the desired support without hurting the current cases.

I haven't checked, but i'm not sure we have all the necessary samples
on RPU though -- i suspect we only have the ones for predictors 2 and 7.
(ignoring the obvious predictor 0), and for sure i don't expect that we have
the full permutation matrix of all 4 variants of component per pixel (1, 2, 3, 4).

So essentially we need at least 8*4 samples. (i may be forgetting some other permutation).
It's obviously not fatal, it just requires someone to write a testcase generator for them.

@LebedevRI
Copy link
Member

So essentially we need at least 8*4 samples. (i may be forgetting some other permutation).

Right, restart interval.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 30, 2023

I haven't checked, but i'm not sure we have all the necessary samples
on RPU though -- i suspect we only have the ones for predictors 2 and 7.

Blackmagic DNGs have predictor 6, but have another problem (component arrangement as mentioned above and in #189) preventing its testing...

So it might be an idea to generalize the component arrangement first (1x1, 1x2, 2x1, 2x2), thus ticking Sony lossless off the list?

Or perhaps you prefer to handle both component arrangement and predictors in one go? I hope @cytrinox could help with the 4*8 test vectors then... 🙏

@LebedevRI
Copy link
Member

So it might be an idea to generalize the component arrangement first (1x1, 1x2, 2x1, 2x2), thus ticking Sony lossless off the list?

Sony FUBAR'ed their "LJpeg" implementation to the point of it no longer being a proper LJpeg.
We won't be able to support it in a generic LJpeg decompressor.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 30, 2023

to the point of it no longer being a proper LJpeg

Why do you think so? AFAICT it is, just that the component scatter/gather is different from the Adobe one... I.e. for a 2x2 Bayer CFA Adobe does 2 components like this

1 2 1 2 ...
1 2 1 2 ...

While Sony does 4 components like this:

1 2 1 2 ...
3 4 3 4 ...

Each component in both Adobe and Sony cases seems to be regular LJpeg SOF3 (w/ already supported predictor 1)... The arrangement of components has nothing to do w/ LJpeg itself, and current LJpeg code seems to work fine once the arrangement is taken into account..

What they FUBAR'ed is on the TIFF container level by having both Tile* and Strip* tags simultaneously, and making ImageWidth/Height an integer of tile size, when it doesn't have to be (which is why I advocated for the absolute crop for these models covering all modes instead of a relative, negative one; but this then makes APS-C crop more complicated so it's a lose-lose either way)...

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 30, 2023

And BlackMagic do something even weirder, from a 2x2 Bayer CFA:

1 1 ...
1 1 ...

to a single components LJpeg like this:

1 1 1 1

(w/ predictor 6)

@LebedevRI
Copy link
Member

and current LJpeg code seems to work fine once the arrangement is taken into account..

That's my point exactly actually. As long as that sonyArrange_ can't be
deduced from the LJpeg container itself, it's broken.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 30, 2023

As long as that sonyArrange_ can't be
deduced from the LJpeg container itself, it's broken.

Sure, and I already mentioned there that should be generalized - one just needs to parse the SOF3 header for the no. of components and their frame size, and compare to the TIFF CFA tile size, and that gives the arrangement:

E.g.

Sony case: TIFF tile is 512x512, SOF3 is 4 components of 256x256

Adobe case: TIFF tile is 256x256, SOF3 is 2 components of 256x128

BlackMagic case: TIFF tile is 512x512, SOF3 is 1 component of 256x1024

The arrangement is calculated from the TIFF tile HxW divided by SOF3 YxX, and to check conformance, one must have N_comp*Y*X = H*W (I think already checked).

Both the Sony and BlackMagic SOF3 headers are correct in what they implement AFAICT, so nothing is really at odds here, "just" missing implementation in rawspeed...

@LebedevRI
Copy link
Member

I see.
So for cpp=2, we have two possible layouts:

x x

and

x
x

and for cpp=4 we have three:

x x x x
x x
x x
x
x
x
x

.. that multiplies the needed sample set by another x2 or so.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 30, 2023

that multiplies the needed sample set by another x2 or so

😮

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 31, 2023

and for cpp=4 we have three

Not quite. For cpp=4 you can have two:

1 2
3 4

or

1 3
2 4

i.e. row or column major scatter of the 2x2 Bayer into 4 separate component planes. Like this for row major:

image

(In theory, one can of course have all the other ways of ordering 4 components, 4! in total, but that would be a bit silly...)

For cpp=1 you could have 2 (well, 3 if also you count identity, i.e. no rearrangement, just like you illustrated above).

@LebedevRI
Copy link
Member

Sorry, not following the logic there. I can see how the arrangements i listed can be specified
via different frame sizes, but not the other orders.

@kmilos
Copy link
Collaborator Author

kmilos commented Jan 31, 2023

Sorry, not following the logic there. I can see how the arrangements i listed can be specified
via different frame sizes, but not the other orders.

Np, I'll try to do a clearer drawing for the three (Adobe, Sony, and BlackMagic) cases a bit later and add here.

@kmilos
Copy link
Collaborator Author

kmilos commented Feb 1, 2023

@LebedevRI I hope the following illustration of the scatter/gather schemes makes it clearer:

dng_ljpeg

These are all perfectly valid w.r.t. ITU-T T.81. The components highlighted in yellow make up one MCU, and the dotted ones are used as predictors (usually 1, i.e. just horizontal, but 6 for Blackmagic/CinemaDNG case).

(There is also the trivial case of DNG spp=3 LinearRaw with 1:1 mapping to SOF3 Nf=3, which is already supported for predictor 1, but not predictor 7 using the row above like Apple do.)

@LebedevRI
Copy link
Member

One thing i can say right away -- if nothing, it's a pretty nice infographics!

@artemist
Copy link
Contributor

artemist commented Feb 17, 2023

In the JFIF SOF3 Sony advertises blocks as 2W x H/2. Wait no you're right I was thinking of the number of samples I have to write per "row" as a consequence of decoding in one pass.

I have a somewhat dirty PR in #386 that supports the Sony arrangement and might support blackmagic arrangement if I understand it correctly but I can't test.

@kmilos
Copy link
Collaborator Author

kmilos commented Feb 27, 2024

I think this was the last time I rebased this - the new code structure that imposes use of a single row only (effectively supporting only predictor 1) makes working around this even uglier than before, if that's possible 😮

kmilos referenced this pull request Mar 26, 2024
This is true for all RPU samples at least, even weird 3-component ones.

This seems like the missing info trivia which allows support
for different LJpeg CPS layouts (e.g. the square one)
@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

Attention: Patch coverage is 12.85714% with 61 lines in your changes missing coverage. Please review.

Project coverage is 60.76%. Comparing base (876c91f) to head (fb143a7).
Report is 229 commits behind head on develop.

Files with missing lines Patch % Lines
...rc/librawspeed/decompressors/LJpegDecompressor.cpp 10.60% 59 Missing ⚠️
...zz/librawspeed/decompressors/LJpegDecompressor.cpp 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #334      +/-   ##
===========================================
- Coverage    60.95%   60.76%   -0.20%     
===========================================
  Files          273      273              
  Lines        16408    16466      +58     
  Branches      2077     2078       +1     
===========================================
+ Hits         10002    10006       +4     
- Misses        6278     6332      +54     
  Partials       128      128              
Flag Coverage Δ
benchmarks 11.84% <0.00%> (-0.05%) ⬇️
integration 44.81% <15.00%> (-0.14%) ⬇️
linux 56.94% <13.04%> (-0.20%) ⬇️
macOS 25.20% <0.00%> (-0.07%) ⬇️
rpu_u 44.81% <15.00%> (-0.14%) ⬇️
unittests 21.55% <0.00%> (-0.08%) ⬇️
windows ∅ <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants